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template defining a circuit for executing the- respective algorithm portion f and interconnects the identified, template such thai the 
interconnected templates define a circsrit that U operable to execute the algorithm. As compared to prior design tools, this tool may 
decrease the time and effort required to design a circuit for instantiation, on a programmable logic integrated circuit (Pi.JC) or on an 
application- s|>ecific integrated circuit (ASIC) by allowing one to construct the circuit from, previously written templates that define 
previously tested and debugged circuits, A library includes one or more circuit templates and an interface template. The one or more 
circuit templates each define a respective circuit operable so execute a respective algorithm or portion thereof. And the interface 
template defines a hard ware layer operable to interface one of the circuits to pins of a programmable logic circuit when the layer and 
the one circuit are instantiated on the programmable logic circuit. Such a. library may shorten the time attd reduce the effort that an 
engineer expends de-signing a circuit for instantiation on a FLIC or ASK!! by allowing the engineer to build the circuit from templates 
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COMPUTER-BASED TOOL AND METHOD FOR DESIGNING AN 
ELECTRONIC CIRCUIT AND RELATED SYSTEM AND LIBRARY FOR 

SAME 

Claim of priority 

5 [13 This application claims priority to U.S. Provisiona! Application 

Seriaf Nos. 80/615,192, 60/615,157, 60/615,170, 60/615,158, 60/615,193, 
and 60/615,050, filed on 01 October 2004, which are incorporated by 
reference. 

Cross reference to related applications 

1 0 [2J This application is related to U.S. Patent Application Sena! 

Nos. .... (Attorney Docket 

Nos. 1934-021-03, 1934-024-03, 1934-025-03, 1934-026-03, 1934-031-03, 
1934-035-03, and 1934-036-03), which have a common filing date of 03 
October 2005 and assignee and which are incorporated by reference. 

15 Background 

[3] Electronics engineers often instantiate circuits, such as logic 

circuits, on programmable logic integrated circuits (PLICs) such as 
field-programmable gate arrays (FPGAs), and on application-specific 
integrated circuits {ASICs). Because an engineer typically configures with 

20 firmware the circuit components and interconnections inside of a PUC, he 
can modify a circuit instantiated on the PLIC merely by modifying and 
reloading the firmware. An example of a computer architecture that exploits 
the ability to configure and reconfigure circuitry within a PLIC with firmware 
is described in U.S. Patent Publication No. 2004/0133763, which is 

26 incorporated herein by reference. 

[4] But unfortunately, it is often difficult and time consuming to 

design a circuit for instantiation on a PLIC, and an increase In the level of 
design difficulty and the time required to complete the design often 
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accompany the routing resources, component density, and component 
variety on a PLIC. 

[5] Comparatively, when a software programmer writes source 

code for a software application, he can often save time by incorporating into 
5 the application previously written and debugged software objects from a 
software-object library. Suppose the programmer wishes to write a 
software application that solves for y in the following equation: 

(1) y^ + z 3 

10 

Further suppose that a software-object library includes a first software 
object for squaring a value (here x), a second software object for cubing a 
value (here z), and a third software object for summing two values (here x 2 
and z 3 ). By incorporating pointers to these three objects in the source code, 

15 a compiler effectively merges these objects into the software application 
while compiling the source code. Therefore, the object library allows the 
programmer to write the software application in a shorter time and with less 
effort because the programmer does not have to "reinvent the wheel" by 
writing and debugging pieces of source code that respectively square x, 

20 cube z, and sum x 2 and z 3 . Furthermore, if the programmer needs to 
modify the software application, he can do so without modifying and 
re-debugging the first, second, and third software objects. 

[B] In contrast, there are typically no time- or effort-saving 

equivalents of software objects available to a hardware engineer who 
25 wishes to design a circuit for instantiation on a PLIC; consequently, when a 
hardware engineer designs a circuit for instantiation on a PLIC, he typically 
must write the source code (e.g., Verilog Hardware Description Language 
(VHDL)) "from scratch." Suppose that an engineer wishes to design a logic 
circuit that solves for y equation (1 ). Because there are typically no 
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hardware equivalents of the first, second, and third software objects 
described in the preceding paragraph, the engineer may write source code 
that describes first and second portions of a circuit for solving equation (1). 
The first circuit portion squares x, cubes z, and sums x 2 and z 3 , and the 

5 second circuit portion interfaces the first circuit portion to the external pins 
of the PLIC. The engineer then complies the source code with PLIC design 
tool (typically provided by the PLIC manufacturer), which synthesizes and 
routes the circuit and then generates the configuration firmware that, when 
loaded into the PLIC, instantiates the circuit. Next, the engineer loads the 

10 firmware Into the PLIC and debugs the instantiated circuit. Unfortunately, 
the synthesizing and routing steps are often not trivial, and may take a 
number of hours or even days depending upon the size and complexity of 
the circuit. And even if the engineer makes only a minor modification to a 
small portion of the circuit, he typically must repeat the synthesizing, 

1 5 routing, and debugging steps for the entire circuit, 

[7] Another factor that may add to the time and effort that an 

engineer expends while designing a circuit for instantiation on a PLIC is that 
a PLIC design tool typically recognizes only hardware-specific source code. 
Suppose that a mathematician, who writes an equation using mathematical 

20 symbols (e.g.., "+," V "1" V "0," "x 2 ," "z 5 ," and "V"), wishes to 
instantiate on a PLIC a circuit that solves for a variable in a complex 
equation that includes, e.g., partial derivatives and integrations. Because a 
PLIC design tool typically recognizes few, if any, mathematical symbols, the 
mathematician often must explain the equation and the desired operating 

25 parameters (e.g., latency and precision) of the circuit to a hardware 

engineer, who then translates the equation and operating parameters into 
source code that the design tool recognizes. These explanation and 
translation steps are often time consuming and difficult for the engineer, 
particularly where the equation is mathematically complex or the circuit has 

30 stringent operating parameters (e.g., high speed, high precision). 
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[8] Therefore, a need has arisen for a new methodology and for a 

new tool for designing a circuit for instantiation on a PUC. 

Summary 

[9| According to an embodiment of the invention, a 

5 computer-based circuit design toot includes a front end, an interpreter 
coupled to the front end, and an integrator coupled to the interpreter. The 
front end receives symbols that define a logical expression, and the 
interpreter parses the expression into respective portions. The integrator 
identifies a corresponding circuit template for each of the expression 
1 0 portions, and logically interconnects the identified templates into a 
representation of an electronic circuit that is operable to execute the 
expression. 

[10] As compared to prior circuit design tools, such a tool may 

shorten the time and reduce the effort that an engineer expends designing 
1 5 a circuit for instantiation on a PUC by allowing the engineer to build the 
circuit from templates of previously designed and debugged circuits. 

[11] According to a related embodiment of the invention, the front 

end of the design tool recognizes mathematical symbols so that one can 
design a PUC circuit for executing a mathematical expression with little or 
20 no assistance from a hardware engineer. 

[12] According to an embodiment of the invention, a library includes 

one or more circuit templates and an interface template. The one or more 
circuit templates each define a respective circuit operable to execute a 
respective algorithm or portion thereof. And the interface template defines 
25 a hardware layer operable to interface one of the circuits to pins of a 
programmable logic circuit when the layer and the one circuit are 
instantiated on the programmable logic circuit. 

[13] Such a library may shorten the time and reduce the effort that 

an engineer expends designing a circuit for instantiation on a PUC or ASIC 
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by allowing the engineer to build the circuit from templates of previously 
designed and debugged circuits. 

Brief Description of the Drawings 

[14] FIG. 1 is a block diagram of a peer-vector computing machine 

5 having a pipelined accelerator that one can design with a design tool 
according to an embodiment of the invention. 

[15] FIG. 2 Is a block diagram of a pipeline unit that includes a PUC 

and that can be included in the pipelined accelerator of FIG. 1 according to 
an embodiment of the invention, 

10 [16] FIG. 3 is a diagram of the circuit layers that compose the 

hardware interface layer within the PUC of FIG. 2 according to an 
embodiment of the invention. 

[17] FIG. 4 is a block diagram of the circuitry that composes the 

interface adapter and framework services layers of FIG. 3 according to an 
1 5 embodiment of the invention, 

[18] FIG. 5 is a diagram of a hardware-description file for a circuit 

that one can instantiate on a PUC according to an embodiment of the 
invention. 

[19] FIG. § is a block diagram of a PUC circuit-template library 

20 according to an embodiment of the invention. 

[20j FIG. 7 Is a block diagram of circuit-design system that includes 

a computer-based tool for designing a circuit using templates from the 
library of FIG. 6 according to an embodiment of the invention. 

[21] FIG. 8 illustrates the parsing of a mathematical expression 

25 according to an embodiment of the invention. 

[22] FIG. 9 illustrates a table of hardwired-pipeline library templates 

corresponding to the hardwired-pipelines available for executing respective 
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portions of the parsed mathematical expression of FIG, 8 according to an 
embodiment of the invention. 

[23] FIG. 10 is a block diagram of a circuit that the tooi of FIG. 7 

generates from circuit templates downloaded from the library of FIG. 6 
5 according to an embodiment of the invention. 

[24] FIG. 11 is a block diagram of a circuit that the tool of FIG. 7 

generates from circuit templates downloaded from the library of FIG. 6 
according to another embodiment of the invention. 

[25] FIG. 12 is a block diagram of a circuit that the tooi of FIG. 7 

10 generates from circuit templates downloaded from the library of FIG. 6 
according to yet another embodiment of the invention. 

[26] FIG. 13 is a block diagram of a circuit that the tool of FIG. 7 

generates for implementing a function as a series expansion according to 
an embodiment of the invention. 

1 5 [27] FIG. 14 is a block diagram of a circuit that the tool of FIG. 7 

generates for implementing the function of FIG. 13 as a series expansion 
according to another embodiment of the invention. 

|28] FIG. 15 is a block diagram of a power-of-x term generator that 

the tool of FIG. 7 generates as a replacement for the power-of-x multipliers 
20 of FIGS. 13 and 14 according to an embodiment of the invention. 

[29] FIG. 16 is a block diagram of a circuit that the tool of FIG. 7 

generates for implementing another function as a series expansion 
according to an embodiment of the invention. 

£30] FIG. 17 is a block diagram of a sign determiner from FIG. 16 

25 according to an embodiment of the invention. 

DETAILED DESCRIPTION 

Introduction 
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[31| A computer-based circuit design tool according to an 

embodiment of the invention is discussed below in conjunction with FIGS, 7 
»10. 

[32] But first is presented in conjunction with FIGS. 1 - 6 an 

5 overview of concepts that are related to the design tool according to an 
embodiment of the invention. An understanding of these concepts should 
facilitate the reader's understanding of the design tool. 

Overview Of Concepts Related To Design Tool 

[33J FIG. 1 is a schematic block diagram of a computing machine 

10 10, which has a peer-vector architecture according to an embodiment of the 
invention, in addition to a host processor 12, the peer-vector machine 10 
includes a pipelined accelerator 14, which is operable to process at least a 
portion of the data processed by the machine 10, Therefore, the host- 
processor 12 and the accelerator 14 are "peers" that can transfer data 

15 messages back and forth. Because the accelerator 14 includes hardwired 
logic circuits instantiated on one or more PLICs, it executes few, if any, 
program instructions, and thus typically performs mathematically intensive 
operations on data significantly faster than a bank of computer processors 
can for a given clock frequency. Consequently, by combing the 

20 decision-making ability of the processor 12 and the number-crunching 

ability of the accelerator 14, the machine 10 has the same abilities as, but 
can often process data faster than, a conventional processor-based 
computing machine. Furthermore, as discussed below and in U.S. Patent 
Publication No, 2004/0136241, which is incorporated by reference, 

25 providing the accelerator 14 with a communication interface that is 

compatible with the interface of the host processor 12 facilitates the design 
and modification of the machine 10, particularly where the communication 
interface is an industry standard. And where the accelerator 14 includes 
multiple pipeline units (FIG. 2), providing each of these units with this 

30 compatible communication interface facilitates the design and modification 
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of the accelerator, particularly where the communication interface is an 
industry standard. Moreover, the machine 10 may also provide other 
advantages as described in the following other patent publications, which 
are incorporated by reference: 2004/0133763; 2004/0181621 ; 
5 2004/01 70070; and, 2004/01 30927. 

[34] Still referring to FIG. 1, in addition to the host processor 12 and 

the pipelined accelerator 14, the peer-vector computing machine 10 
includes a processor memory 16, an interface memory 1 8, a bus 20, a 
firmware memory 22, an optional raw-data input port 24, an optional 
10 processed-data output port 26, and an optional router 31. 

[35] The host processor 12 includes a processing unit 32 and a 

message handler 34, and the processor memory 16 includes a 
processing-unit memory 36 and a handler memory 38, which respectively 
serve as both program and working memories for the processor unit and 

15 the message handler. The processor memory 36 also includes an 
accelerator-configuration registry 40 and a message-configuration 
registry 42, which store respective configuration data that allow the host 
processor 1 2 to configure the functioning of the accelerator 14 and the 
structure of the messages that the message handler 34 sends and 

20 receives, 

[36] The pipelined accelerator 14 includes at least one PUC (FIG. 

2) on which are disposed hardwired pipeline 44i - 44 m which process 
respective data while executing few, If any, program instructions. The 
firmware memory 22 stores the configuration fimrtware for the PLIC(s) of the 
25 accelerator 14, If the accelerator 14 is disposed on multiple PLICs, these 
PLICs and their respective firmware memories may be disposed on multiple 
circuit boards that are often called daughter cards or pipeline units (FIG. 2). 
The accelerator 14 and pipeline units are discussed further in previously 
incorporated U.S. Patent Publication Nos. 2004/0136241, 2004/0181621, 
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and 2004/0130927, The pipeline units are also discussed below in 
conjunction with FIGS. 2-4. 

[37] Generally, in one mode of operation of the peer-vector 

computing machine 10, the pipelined accelerator 14 receives data from one 

5 or more software applications running on the host processor 12, processes 
this data in a pipelined fashion with one or more logic circuits that execute 
one or more mathematical algorithms, and then returns the resulting data to 
the application^}. As stated above, because the logic circuits execute few 
if any software instructions, they often process data one or more orders of 

10 magnitude faster than the host processor 12. Furthermore, because the 
logic circuits are Instantiated on one or more PUCs, one can modify these 
circuits merely by modifying the firmware stored in the memory 52; that is, 
one need not modify the hardware components of the accelerator 14 or the 
interconnections between these components. The operation of the 

15 peer-vector machine 10 is further discussed In previously incorporated U.S. 
Patent Publication No. 2004/0133763, the functional topology and operation 
of the host processor 1 2 is further discussed in previously incorporated U.S. 
Patent Publication No. 2004/0181621 , and the topology and operation of 
the accelerator 14 is further discussed in previously incorporated U.S. 

20 Patent Publication No. 2004/01 36241 . 

[38] FIG* 2 is a diagram of a pipeline unit SO of the pipelined 

accelerator 14 of FIG. 1 according to an embodiment of the invention. 

[393 unit 50 includes a circuit board 52 on which are disposed 

the firmware memory 22, a platform-identification memory 54, a bus 
25 connector 56, a data memory 58, and a PLIC 60. 

[40] As discussed above in conjunction with FIG. 1 , the firmware 

memory 22 stores the configuration firmware that the PLIC 60 downloads to 
instantiate one or more logic circuits. 

9 



WO 2806/03971« 



PCT/fc'S2»OS/035813 



[41J The platform memory 54 stores a value that identifies the one 

or more platforms with which the pipeline unit SO is compatible. Generally, 
a platform specifies a unique set of physical attributes that a pipeline unit 
may possess. Examples of these attributes include the number of external 

5 pins (not shown) on the PLIC 60, the width of the bus connector 56, the size 
of the PLIC, and the size of the data memory. Consequently, a pipeline unit 
50 is compatible with a platform if the unit possesses all of the attributes 
that the platform specifies. So a pipeline unit 50 having a bus connector 56 
with thirty-two bits is incompatible with a platform that specifies a bus 

10 connector with sixty-four bits. Some platforms may be compatible with the 
peer vector machine 10 (FIG. 1), and others may be incompatible. 
Therefore, the platform identifier stored in the memory 54 may allow the 
host processor 12 (FIG. 1} to determine whether the pipeline unit 50 is 
compatible with the platforms supported by the machine 10. And where the 

1 5 pipeline unit 50 is so compatible, the platform identifier may also allow the 
host processor 12 to determine how to configure the PLIC 60 or other 
portions of the pipeline unit. 

[423 The bus connector 56 is a physical connector that Interfaces 

the PLIC 60, and perhaps other components of the pipeline unit 50, with the 
20 pipeline bus 20 of FIG, 1 . 

[43| The data memory 58 acts as a buffer for storing data that the 

pipeline unit 50 receives from the host processor 12 (FIG. 1) and for 
providing this data to the PLIC 60. The data memory 58 may also act as a 
buffer for storing data that the PLIC 60 generates for sending to the host 
25 processor 12, or as a working memory for the hardwired pipelines 44. 

[443 instantiated on the PLIC 60 are logic circuits that compose the 

hardwired pipeline(s) 44 and a hardware interface layer 62, which interfaces 
the hardwired pipelines to the externa! pins (not shown) of the PLIC 60, and 
which thus Interfaces the pipelines to the pipeline bus 20 (via the connector 
30 56), the firmware and platform-identification memories 22 and 54, and the 

10 
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data memory 58. Because the topology of Interface layer 62 is primarily 
dependent upon the attributes specified by the platform(s) with which the 
pipeline unit 50 is compatible, one can often modify the pipetine(s) 44 
without modifying the interface layer. For example, if a platform with which 

5 the pipeline unit 50 is compatible specifies a thirty-two-bit bus, then the 
interface layer 62 provides a thirty-two-bit bus connection to the bus 
connector 60 regardless of the topology or other attributes of the pipeline(s) 
44. Consequently, as discussed below in conjunction with FIGS. 7 - 10, an 
embodiment of the computer-based design tool allows one to design and 

1 0 debug the pipeiine(s) 44 independently of the interface layer 62, and vice 
versa. 

[45] Still referring to FIG. 2, alternate embodiments of the pipeline 

unit 50 are contemplated. For example, the memory 54 may be omitted, 
and the platform identifier may stored in the firmware memory 22, or by a 
1 5 jumper-configurable or hardwired circuit {not shown). 

[46] A pipeline unit similar to the unit 50 is discussed In previously 

incorporated U.S. Patent Publication No. 2004/0136241, 

[47] FIG. 3 is a diagram of the hardware layers that compose the 

hardware interface layer 62 within the PLJC $0 of FIG. 2 according to an 

20 embodiment of the invention. The hardware interface layer 62 includes 
three layers of circuitry that is instantiated on the PLIC 60: an 
interface-adapter layer 70, a framework-services layer 72, and a 
communication layer 74, which is hereinafter called a communication shell. 
The interface-adapter layer 70 includes circuitry, e.g., buffers and latches, 

25 that interfaces the framework-services layer 72 to the external pins (not 
shown) of the PLIC 60. The framework-services layer 72 provides a set of 
services to the hardwired pipeiine(s) 44 via the communication shell 74. 
For example, the layer 72 may synchronize data transfer between the 
pipeline(s) 44, the pipeline bus 20 (FIG. 1), and the data memory 58 (FIG. 

30 2), and may control the sequencers) in which the pipeline(s) operate. The 

11 
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communication shell 74 includes circuitry, e.g., latches, that Interface the 
framework-services layer 72 to the pipeiine(s) 44. 

[48] Still referring to FIG. 3, alternate embodiments of the 

hardware-interface layer 62 are contemplated. For example, although the 

5 framework-services layer 72 is shown as isolating the interface-adapter 
layer 70 from the communication shell 74, the interface-adapter layer may, 
at least at some circuit nodes, be directly coupled to the communication 
shell. Furthermore, although the communication shell 74 is shown as 
isolating the interface-adapter layer 70 and the framework-services layer 72 

1 0 from the hardwired pipeline(s) 44, the interface-adapter layer or the 

framework-services layer may, at least at some circuit nodes, be directly 
coupled to the pipellne(s). 

[49J FIG. 4 is a schematic block diagram of the circuitry that 

composes the interface-adapter layer 70 and the framework-services layer 
1 5 72 of FIG. 3 according to an embodiment of the invention. 

[50J A communication interface 80 and an optional 

industry-standard bus interface 82 compose the interface-adapter layer 70, 
and a controller 84, exception manager 86, and configuration manager 88 
compose the framework-services layer 72. 

20 [511 "The communication interface 80 transfers data between a peer, 

such as the host processor 12 (FIG. 1) or another pipeline unit 50 (FIG. 2), 
and the firmware memory 22, the platform-identifier memory 54, the data 
memory 58, and the following components instantiated within the PLIC 60: 
the hardwired pipelines 44 (via the communication shell 74), the controller 

25 86, the exception manager 88, and the configuration manager 90. If 
present, the optional industry-standard bus interface 82 couples the 
communication interface 80 to the bus connector 56. Alternatively, the 
interfaces 80 and 82 may be combined such that the functionality of the 
interface 82 is included within the communication interface 80. 
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[52] The controller 84 synchronizes the hardwired pipelines 44 f - 

44„ and monitors and controls the sequence in which they perform the 
respective data operations In response to communications, Le., "events," 
from other peers. For example, a peer such as the host processor 12 may 
5 send an event to the pipeline unit 50 via the pipeline bus 20 to indicate that 
the peer has finished sending a block of data to the pipeline unit and to 
cause the hardwired pipelines 44i - 44 n to begin processing this data. An 
event that includes data is typically called a message, and an event that 
does not include data is typically called a "door bell." 

1 0 [53J The exception manager 86 monitors the status of the hardwired 

pipelines 44 f - 44 m the communication interface 80, the communication 
shell 74, the controller 84, and the bus interface 82 (if present), and reports 
exceptions to the host processor 12 (FIG, 1). For example, if a buffer (not 
shown) in the communication interface 80 overflows, then the exception 

1 5 manager 86 reports this to the host processor 12. The exception manager 
may also correct, or attempt to correct, the problem giving rise to the 
exception. For example, for an overflowing buffer, the exception manager 
86 may increase the size of the buffer, either directly or via the configuration 
manager 88 as discussed below. 

20 [54] The configuration manager 88 sets the "soft" configuration of 

the hardwired pipelines 44i - 44 m the communication interface 80, the 
communication shelf 74, the controller 84, the exception manager 86, and 
the interface 82 (if present) in response to soft-configuration data - from the 
host processor 12 (FIG. 1). As discussed in previously incorporated U.S. 

25 Patent Publication No. 2004/01 33763, the "hard" configuration of a 
component within the PLIC 60 denotes the actual instantiation, on the 
transistor and circuit-block level, of the component, and the soft 
configuration denotes the physical parameters (e.g., data width, table size) 
of the instantiated component. That Is, soft-configuration data is similar to 

30 the data that one can load into a register of a processor (not shown in FIG. 
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4) to set the operating mode (e.g., burst-memory mode) of the processor. 
For example, the host processor 12 may send to the PLIC 60 
soft-configuration data that causes the configuration manager 88 to set the 
number and respective priority levels of queues (not shown) within the 
5 communication interface 80. The exception manager 86 may also send 
soft-configuration data that causes the configuration manager 88 to, e.g., 
increase the size of an overflowing buffer in the communication interface 
80. 

[55} The communication interface 80, optional industry-standard 

10 bus Interface 82, controller 84, exception manager 86, and configuration 
manager 88 are further discussed in previously incorporated U.S. Patent 
Publication No. 2004/0136241. 

[561 Referring again to FIG. 2, although the pipeline unit SO is 

disclosed as including only one PUC GO, the pipeline unit may include 

15 multiple PLICs. For example, as discussed in previously incorporated U.S. 
Patent Publication No. 2004/0136241 , the pipeline unit 50 may include two 
interconnected PLICs, where the circuitry that composes the 
interface-adapter layer 70 and framework-services layer 72 is instantiated 
on one of the PLICs, and the circuitry that composes the communication 

20 shell 74 and the hardwired pipelines 44 is instantiated on the other PUC. 

[571 FIG. S is a diagram of a hardware-description file 100 from 

which a conventional PLIC synthesizer and router tool (not shown) can 
generate the configuration firmware for the PLIC 60 of FIGS. 2-4 
according to an embodiment of the invention. Typically, the 

25 hardware-description file 100 includes templates that are written in a 

conventional hardware description language (HDL) such as Verilog® HDL. 
The top-down structure of the file 1 00 resembles the top-down structure of 
software source code that incorporates software objects. Such a top-down 
structure for software source code provides at least two advantages. First, 

30 it allows a programmer to avoid writing and debugging source code for a 

14 



WO 2806/03971« 



PCT/fc'S2»OS/035813 



function when a software object that performs the function has already been 
written and debugged. Second, it allows the programmer to change or add 
a function by modifying an existing object or writing a new object with little 
or no rewriting and debugging of the source code that incorporates the 

5 object. As discussed below, the top-down structure of the file 100 provides 
similar advantages. For example, it allows one to incorporate in the file 100 
existing templates that define an already-debugged hardware-interface 
layer 62 (FIGS. 2 - 3). Furthermore, it allows one to change an existing 
hardwired pipeline 44 or to add to a circuit a new hardwired pipeline 44 with 

1 0 little or no rewriting and debugging of the templates that define the layer 62. 

[58J The hardware-description file 100 includes a top-level template 

101 , which includes respective top-level definitions 1 02, 104, and 106 of the 
interface-adapter layer 70, the framework-services layer 72, and the 
communication shell 74 (collectively the hardware-interface layer 62) of the 
1 5 PLIC 60 (FIGS. 2 - 4). The template 101 also defines the connections 
between the external pins (not shown) of the PLIC 60 and the interface- 
adapter 70 (and in some cases the framework-services layer 72), and also 
defines the connections between the framework-services layer (and in 
some cases the interface-adapter layer) and the communication shell 74, 

20 [m] The top-level definition 102 of the interface-adapter layer 70 

(FIGS. 3 - 4) incorporates an interface-adapter-layer template 108, which 
further defines the portions of the interface-adapter layer defined by the top- 
level definition 1 02. For example, suppose that the fop-level definition 102 
defines a data-input buffer (not shown) in terms of its input and output 

25 nodes. That is, suppose the top-level definition 102 defines the data-input 
buffer as a functional block having defined input and output nodes. The 
template 108 defines the circuitry that composes this functional buffer block, 
and defines the connections between this circuitry and the buffer input 
nodes and output nodes recited in the top-level definition f 02. 

30 Furthermore, the template 1 08 may incorporate one or more lower-level 
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templates 1 09 that further define the data buffer or other components of the 
interface-adapter layer 70 recited in the template 1 08. Moreover, these one 
or more lower-level templates 109 may each incorporate one or more even 
lower-leve! templates (not shown), and so on, until all portions of the 
5 interface-adapter layer 70 are defined in terms of circuit components (e.g. , 
flip-Hops, iogie gates) that the PLSC synthesizing and routing tool (not 
shown) recognizes. 

[60J Similarly, the top-fevel definition 104 of the framework-services 

layer 72 (FIGS. 3-4) Incorporates a framework-services-layer template 

10 110, which further defines the portions of the framework-services layer 

defined by the definition 104. For example, suppose the top-level definition 
104 defines a counter (not shown) in terms of its input and output nodes. 
The template 110 defines the circuitry that composes this counter, and 
defines the connections between this circuitry and the counter input and 

15 output nodes recited by the top-level definition 1 04. Furthermore, the 
template 110 may incorporate a hierarchy of one or more lower-level 
templates 111 and even iower-ievel templates (not shown), and so on, such 
that ail portions of the framework-services iayer 72 are, at some level of the 
hierarchy, defined in terms of circuit components (e.g., flip-flops, logic 

20 gates) that the PLIC synthesizing and routing tool recognizes. For 
example, suppose the template 1 10 defines the counter as including a 
count-up/down-selector circuit having input and output nodes. The 
template 1 10 may incorporate a lower-level template 111 that defines the 
circuitry within the selector circuit and defines the connections between this 

25 circuitry and the selector circuit's input and output nodes defined by the 
template 110. 

[61] Likewise, the top-level definition 106 of the communication 

shell 74 (FIGS, 3 - 4) incorporates a communication-shell template 112, 
which further defines the portions of the communication shell defined by the 
30 definition 1 06 and which also includes a top-level definition 1 13 of the 
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hardwired pipeiine(s) 44 disposed within the communication shell. For 
example, the definition 11 3 defines the connections between the 
communication shell 74 and the hardwired pipeline(s) 44. 

[62] The top-level definition 113 of the hardwired pipeline(s) 44 

5 (FIGS. 3-4) incorporates one or more hardwired-pipeline templates 114, 
which further define the portions of the hardwired pipeline(s) 44 defined by 
the definition 113. The template or templates 1 14 may each incorporate a 
hierarchy of one or more lower-level templates 115 and even lower-level 
templates (not shown) such that all portions of the respective pipeiine(s) 44 
10 are, at some level of the hierarchy, defined in terms of circuit components 
(e.g., flip-flops, logic gates) that the PLIC synthesizing and routing tool 
recognizes. 

[63] Moreover, the communication-shell template 112 may 

incorporate a hierarchy of one or more lower-level templates 116 and even 
15 lower-level templates (not shown) such that all portions of the 

communication shell 74 other than the hardwired pipeline(s) 44 are, at 
some level of the hierarchy, defined in terms of circuit components (e.g., 
flip-flops, logic gates) that the PLIC synthesizing and routing tool 
recognizes. 

20 [643 Still referring to FIG. S, a configuration template 11 8 provides 

definitions for one or more parameters having values that one can set to 
configure the circuitry that the templates 101, 108, 110, 112, f?4and lower- 
level templates 109 f 111, 115, and 1 f 6 define. For example, suppose that 
the bus interface 82 of the interface-adapter layer 70 (FIG. 4) is 

25 configurable to have either a thirty-two-bit or a sixty-four-bit interface with 
the bus connector 56. The configuration template 118 defines a template 
BUS-WIDTH, the value of which determines the width of the interface 
between the interface 82 and the connector 56. For example, 
BUS-WIDTH-0 configures the interface 82 to have a thirty-two-bit interface, 

30 and BUS-WIDTH-1 configures the interface 82 to have a sixty-four-blf 

17 



WO 2806/03971« 



PCT/fc'S2»OS/035813 



interface. Examples of other parameters that may be configurable include 
the depth of a first-in-first-out (FIFO) data buffer (not shown) disposed 
within the framework-services layer 72 (FIGS. 2 - 4), the lengths of 
messages received and transmitted by the interface-adapter layer 70, and 
5 the precision and data structure (e.g., integer, floating-point) of the 
hardwired pipeiine(s) 44. 

[65] One or more of the templates 101 , 108, 110, 112, 114 and the 

tower-level templates (not shown) incorporate the parameters defined in the 
configuration template 118, The PLIC synthesizer and router tool (not 

1 0 shown) configures the interface-adapter layer 70, the framework-services 
layer 72, the communication shell 74, and the hardwired pipeiine(s) 44 
(FIGS. 3 -4) according to the values in the template 1 18 during the 
synthesis of this circuitry. Consequently, to reconfigure the circuit 
parameters represented by the parameters in the configuration template 

15 118, one need only modify the values of these parameters in the template 
118, and then rerun the synthesizer and router tool on the file 1 00. 
Alternatively, if one or more of the parameters in the configuration template 
11 8 can be sent to the PLIC as soft-configuration data after instantiation of 
the circuit, then one can modify the corresponding circuit parameters by 

20 merely modifying the soft-configuration data. Therefore, according to this 
alternative, may avoid rerunning the synthesizer and router tool on the file 
100. Moreover, templates (e.g., 101, 108, 109, 110, 111, 112, 114, 115, 
and 116) that do not incorporate settable parameters such as those 
provided by the configuration template 118 are sometimes called modules 

25 or entities, and are typically lower-level templates that include Boolean 
expressions that a synthesizer and router tool (not shown) converts into 
circuitry for implementing the expressions. 

[661 Alternate embodiments of the hardware-description file 100 are 

contemplated. For example, although described as defining circuitry for 
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instantiation on a PLIC, the file 1 00 may define circuitry for instantiation on 
an ASIC, 

[67] FIG. 6 is a block diagram of a library 1 20 that stores PLIC 

circuit templates, such as the templates 101, 108, 110, 112, and 114 (and 
5 any existing lower-level templates) of FIG- 5, according to an embodiment 
of the invention. 

168] The library 120 has m+i sections: m sections 122i - 122 m for 

the respective m platforms that the library supports, and a section 124 for 
the hardwired-pipelines 44 (FIGS. 2-4) that the library supports. 

10 [69] For example purposes, the library section ?22 f is discussed in 

detail, it being understood that the other library sections 122 2 - 1 22 m are 
similar, 

[70] The library section i 22i Includes a top-level template 101 1, 

which is similar in structure to the template 101 of FIG. 5, and which thus 
15 includes top-level definitions 102 it 104 1t and 1 061 of versions of the 
interface-adapter layer 70, the framework-services layer 72, and the 
communication shell 74 that are compatible with the platform m-1. 

[71] In this embodiment, we assume that there is only one version 

of the interface-adapter layer 70 and one version of the framework-services 

20 layer 72 available for each platform m, and, therefore, that the library 

section 1 22i includes only one interface-adapter-layer template 1 081 and 
only one framework-services-layer template 1 1 0f, But in an embodiment 
that includes multiple versions of the interface-adapter layer 70 and multiple 
versions of the framework-services layer 72 for each platform m, the library 

25 section 122 f would include multiple interface-adapter- and 
framework-services-layer templates 1 08 and 110. 

[72] The library section f 22* also includes n communication-shell 

templates 112%i - 112i rn , which respectively correspond to the 
hardwired-pipeline templates 1 1 4 f - 1 14 n in the library section 1 24. As 
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stated above in conjunction with FIG» 3, the communication shell 74 
interfaces a hardwired pipeline or hardwired-pipelines 44 to the framework- 
services iayer 72. Because each hardwired pipeline 44 is different and 
typically has different interface specifications, the communication shell 74 is 

5 typically adapted for each hardwired pipeline. Consequently, in this 

embodiment, one provides design adjustments to create a unique version of 
the communication she!! 74 for each hardwired pipeline 44. The designer 
provides these design adjustments by writing a unique communication-shell 
template 112 for each hardwired pipeline. Of course the group of 

10 communication-shell templates 112 1s i ~ 112i t „ corresponds only to the 
version of the framework-services layer 72 that is defined by the template 
1 10i; consequently, if there are multiple versions of the framework-services 
layer 72 that are compatible with the platform m-1 , then the library section 
122i includes a respective group of n communication-shell templates 112 

1 5 for each version of the framework-services layer. 

|73J In addition, the library section ?22 f includes a configuration 

template 118 4 , which defines configuration constants having designer- 
selectable values as discussed above in conjunction with the configuration 
template 118 of FIG. 5. 

20 [74] Furthermore, each template within the library section 1 22 f 

includes, or is associated with, a respective description 1 26? - 134*. The 
descriptions 12$i- 132 %n describe the operational and other parameters of 
the circuitry that the respective templates 101i, 10$ 1f 110i, and 112 it1 - 
1 1 2 i>n define. Similarly, the description 1 34^ describes the settable 

25 parameters in the configuration template 1 1 8 ts the values that these 

parameters can have, and the meanings of these values. The design tool 
discussed below in conjunction with FIGS. 7-11 uses the descriptions 
126i ~ 134i to design and simulate a circuit that includes a combination of 
the hardwired pipelines 44 1 - 44 n , which are respectively defined by the 

30 templates 114i - 114 R . Examples of parameters that the descriptions 1 26? 
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- 1 32i t „ may describe include the width of the data bus and the depths of 
buffers that the circuit defined by the corresponding template includes, the 
latency of the circuit, and the precision of the values received and 
generated by the circuit Furthermore, an example of a settable parameter 
5 and the associated selectable values that the description 1 34* may describe 
is BUS-WIDTH, which represents the width of the interface between the 
communication Interface 80 and the bus connector 50 (FIG. 4), and 
BUSJ/VIDTH-0 sets the bus width to thirty-two bits and BUS_WIDTH-1 
sets the width to sixty-four bits. 

1 0 [751 Each of the descriptions 1 26i - f 34 f may be embedded within 

the respective template 101 1f 108 i} 110 f , 112i - 112 %m and 1 1 8i to which it 
corresponds. For example, the description 1 28? may be embedded within 
the template 1 08 1 as extensible markup language (XML) tags or comments 
that are readable by both a human and the tool discussed below in 

1 5 conjunction with FIGS. 7-11. 

C76J Alternatively, each description 12% ~ 134i may be disposed in 

a separate file that is linked to the template to which the description 
corresponds, and this file may be written in a language other than XML. 
For example, the description 726* may be disposed in a file that is linked to 
20 the top-level template 101 i. 

[77] The section f 22 f of the library 120 also includes a description 

1 36i , which describes the parameters of the platform m~1 . The design too! 
discussed below in conjunction with FIGS. 7 — 11 may use the description 
136i to determine which platforms the library 120 supports. Examples of 

25 parameters that the description 1 36* may describe include 1) for each 
interface, the message specification, which lists the transmitted variables 
and the constraints for those variables, and 2) a behavior specification and 
any behavior constraints. Messages that the host processor 12 (FIG. 1) 
sends to the pipeline units 50 (FIG 2) and that the pipeline units send 

30 among themselves are further discussed in previously incorporated U.S. 
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Patent Publication No. 2004/0181621 . Examples of other parameters that 
the description 1 36? may describe include the size and resources (e.g., the 
number of multipliers and the amount of available memory) of the PUC 60 
(FIGS. 2 - 4). Furthermore, the platform description 13% may be written in 
5 XML or in another language. 

[78] Still referring to FIG. 6, the section 1 24 of the library 120 

includes n hardwired-pipeline templates f?4 t ~ 1 14 nt which each define a 
respective hardwired pipeline 44 f - 44 n (FIGS. 2 - 4). As discussed above 
in conjunction with FIG. 5, because the templates 1 14i ™ 114„ are platform 

1 0 independent (the corresponding communication-shell templates 112 m/i - 
112 mtn define the specified interface to the interface-adapter and 
framework-services layers 70 and 72 of FIGS, 3 ~ 4), the library 1 20 stores 
only one template 114 for each hardwired pipeline 44 (FIGS. 2 - 4). That 
is, each hardwired pipeline 44 does not require a separate template 114 for 

15 each platform that the library 120 supports. As discussed above, an 
advantage of this top-down design is that one need only create a single 
template 1 14 to define a hardwired pipeline 44, not m templates. 

[79] Furthermore, each hardwired-pipeline template 114 includes, 

or is associated with, a respective description f 38? - 138 rt , which describes 

20 the parameters of the hardwired-pipeline 44 that the template defines. Like 
the descriptions 126 1 ~- 134-, discussed above, the design tool discussed 
below in conjunction with FIGS. 7-11 uses the descriptions 138 to design 
and simulate a circuit that includes a combination of the hardwired pipelines 
44i - 44„, which are respectively defined by the templates 1 14 f - 1 14 n , 

25 Examples of parameters that the descriptions 1 38 f - 1 38„ may describe 
include the type (e.g., floating point or integer) and precision of the data that 
the corresponding hardwired pipeline 44 can receive and generate, and the 
latency of the pipeline. Also like the descriptions 726* - 134 1f each of the 
descriptions 138-, - 138„ may be embedded within the respective template 

30 1 14 1 - 1 1 4 n to which the description corresponds as, e.g., XML tags, or 
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may be disposed In a separate file that is linked to the template to which the 
description corresponds. 

[80| Referring again to the library section 122 iy this section also 

includes a description 1 40 of the one or more available pipeline 
5 accelerators 14 {FIG. 1) that support the platform m-1. More specifically, 
the description 140 describes the resources that each of the pipeline 
accelerators 14 includes. For example, the description 140 may indicate 
that one available accelerator 14 includes only one pipeline unit SO (FIG. 2), 
while another available accelerator includes five pipeline units. The 
10 description 1 40 may be written in XWL or in another language. 

[81] Still referring to FIG, 6, alternate embodiments of the library 

120 are contemplated. For example, instead of each template within each 
library section 122 1 ~ 122 m being associated with a respective description 
126- 134, each library section 122i - 122 m may include a single 

15 description that describes all of the templates within that library section. For 
example, this single description may be embedded within or linked to the 
top-level template 101 or the configuration template 118. Furthermore, 
although each library section 122i - 122 m is described as including a 
respective communication-shell template 112 for each hardwired-pipeline 

20 template 1 14 in the library section 124, each section 122 may include fewer 
communication-shell templates, at least some of which are compatible with, 
and thus correspond to, more than one pipeline template 114, In an 
extreme, each library section 122i - 122 m may include only a single 
communication-shell template 112, which is compatible with all of the 

25 hardwired-pipeline templates 114 in the library section 124. In addition, the 
library section 1 24 may include respective versions of each pipeline 
template 1 14 for each communication-shell template 112 in the library 
sections 1 22i - 122 m . 

[82J FIG. 7 is a block diagram of a circuit-design system 150, which 

30 includes a computer-based software tool 152 for designing a circuit using 



WO 2806/03971« 



PCT/fc'S2»OS/035813 



templates from the library 1 20 of FIG. 6 according to an embodiment of the 
invention. By using library templates, the too! 1 52 allows one to design a 
circuit that includes a combination of one or more previously designed and 
debugged hardware-interface layers 62 (FIG. 2) and hardwired pipelines 44 

5 (FIGS. 2 - 4). Because another has already tested and debugged the one 
or more layers 62 and pipelines 44, the tool 1 52 may significantly decrease 
the time required for one to design such a combination circuit as compared 
to a conventional design progression. Furthermore, where one wants to 
design a circuit for executing an algorithm, the tool 152 allows him to define 

10 the circuit with an expression of conventional mathematical symbols, where 
the expression defines the algorithm; consequently, one having little or no 
experience in circuit design can use the tool to design a circuit for executing 
an algorithm. 

[83] The system 1 50 Includes a processor (not shown) for executing 

1 5 the software code that composes the tool 152. Consequently, in response 
to the code, the processor performs the functions that are attributed to the 
tool 152 in the discussion below. But for clarity of explanation, the tool 152, 
not the processor, is described as performing the actions. 

[84] In addition to the processor, the system 150 includes an input 

20 device 154, a display device 1 55, and the library 120 of FIG. 6. The input 
device 1 54, which may include a keyboard and a mouse, allows one to 
provide to the tool 152 information that describes an algorithm and that 
describes a circuit for executing the algorithm. Such information may 
include an expression of mathematical symbols, circuit parameters (e.g., 
25 buffer width, latency) , operation exceptions (e.g., a divide by zero), and the 
platform on which one wishes to instantiate the circuit. And as described 
below, the device 1 55 displays the input information and other information, 
and the library 120 includes the templates that the tool 152 uses to build the 
circuit and to generate a file that defines the circuit. 
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[85] The tool 1 52 includes a symbolic-math front end 1 56, an 

interpreter 1 58, a generator 160 for generating a file 1 62 that defines a 
circuit, and a simulator 1 64.. 

{S$l The front end 156 receives from the input device 1 54 the 

5 mathematical expression that defines the algorithm that the circuit is to 
execute and other design information, and converts this information into a 
form that is readable by the interpreter 1 58. To allow one to define a circuit 
in terms of the mathematical expression that defines the algorithm that the 
circuit $s to execute, in one embodiment the front end 156 includes a web 

1 0 browser that accepts XML with a schema for Math Markup Language 
(MathML). MathML is software standard that allows one to enter 
expressions using conventional mathematical symbols. The schema of 
MathML is a conventional plug in that imparts to a web browser this same 
ability, i.e., the ability to enter expressions using mathematical symbols. 

1 5 Alternatively, the front end 156 may utilize another technique for allowing 
one to define a circuit using a mathematical expression. Examples of such 
another technique include the technique used by the conventional software 
mathematical-expression solver MathCAD. Furthermore, as discussed 
below, one may enter the identity of a platform or pipeline accelerator 14 

20 (FIG. 1) on which he wants the circuit instantiated, and may enter test data 
with which the simulator 164 will simulate the operation of the circuit 
Moreover, one may enter valid-range constraints for any variables within 
the entered mathematical expression and constraints on execution of the 
expression, and may specify the action(s) to be taken if the constraints are 

25 violated. For example, because -1 £ sin(x) & 1 for all values of x, for an 
expression that includes sin(x), one may enter this constraint, and specify 
that any data generated from a value of sin(x) outside of this range is to be 
disregarded. Or, because division by zero of any x yields infinity, one may 
specify that data generated in response to a division by zero is to be 

30 disregarded. The front end 156 then converts all of the entered information 
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into a format, such as HDL, that is compatible with the interpreter 158. 
Moreover, as discussed above, the front end 156 may cause the device 1 55 
to display the input information and other related information. For example, 
the front end 1 SB may cause the device 1 55 to display the mathematical 
5 expression that the designer enters to define the algorithm to be executed 
by the circuit. 

[873 The interpreter 158 parses the information from the front end 

?56and determines: 1) whether the library 1 20 includes templates 114 
(FIG, 6) defining hardwired pipelines 44 (FIGS. 2 - 4) that, when combined, 

1 0 can execute the algorithm entered by the designer, and 2), if the answer to 
(1) is "yes," which, if any, available pipeline accelerators 14 (FIG. 1} 
described by the description 1 40 in the library 1 20 has sufficient resources 
to instantiate a circuit that can execute the algorithm. For example, 
suppose the algorithm includes the mathematical operation Vv. If the library 

15 120 does not include a template 1 14 (FIG. 6) defining a hardwired pipeline 
44 (FIGS. 2 - 4) that calculates the square root of a value, then the 
interpreter 158 determines that the tool 152 cannot generate a file 1 62 that 
defines a circuit for executing the algorithm. Furthermore, suppose that the 
circuit for executing the algorithm requires the resources of at least five 

20 PLICs 60 (FIGS. 2 ~ 4). If the description 1 40 indicates that the available 
accelerators 14 each have only three pipeline units 50 {FIG. 2), and thus 
each have only three PLICs 60, then the interpreter 1 58 determines that 
even though the tool 152 may be able to generate a file 1 62 that defines a 
circuit for executing the algorithm, one cannot implement this circuit on an 

25 available accelerator. The interpreter 158 makes a similar determination if 
the designer indicates that he wants the algorithm executed by a circuit 
having a sixty-four-bjt bus width, but the available platforms support only a 
thirty-two-bit bus width. In situations where the interpreter 158 determines 
that the tool 152 cannot generate a circuit for executing the desired 

30 algorithm or that one cannot implement the circuit on an existing platform 
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and/or accelerator 14, the interpreter 158 causes the device 155 to display 
an appropriate error message (e.g., "no library template for instantiating 
Sv" "insufficient PLIC resources," "bus-width not supported"). 
Furthermore, where the designer identifies a platform or accelerator 14 on 
5 which he desires to instantiate the resulting circuit, the interpreter 1 SB 

determines whether the circuit can be instantiated on the identified platform 
or accelerator. But if the circuit cannot be so instantiated, the Interpreter 
1 58 may determine that the circuit can be instantiated on another platform 
or accelerator, and thus may so inform the designer with an appropriate 
10 message via the display device 1 55. This allows the designer the choice of 
instantiating the circuit on another platform or accelerator 1 4. 

[88] If the interpreter 1 58 determines that the library 1 20 includes a 

sufficient number of hardwired-pipeline templates 1 14 (FIG. 6) to define a 
circuit that can execute the desired algorithm, and also determines that the 
15 circuit can be instantiated on an available platform and accelerator 14 (FIG. 
1), then the interpreter provides to the file generator 1 60 the identities of the 
hardwired-pipeline templates 114 that correspond to portions of the 
algorithm. 

[893 The file generator 160 combines the hardwired pipelines 44 

20 (FIGS. 2-4) defined by the identified hardwired-pipeline templates 114 
such that the combination forms a circuit that can execute the algorithm. 

£90J The generator 160 then generates the file 162, which defines 

the circuit for executing the algorithm in terms of the hardwired pipelines 44 
(FIGS. 2 - 4) and the hardware-interface layers 62 (FIG. 2) that compose 
25 the circuit, the PLIC(s) 60 (FIGS. 2-3) on which the pipelines are disposed, 
and the interconnections between the pipelines (if multiple pipelines on a 
PLIC) and/or between the PLICs (if the pipelines are disposed on more than 
one PLIC). 

[91] Next, the host processor 12 (FIG. 1 ) can use the file 1 62 to 

30 instantiate on the pipeline accelerator 14 (FIG. 1) the defined circuit as 
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discussed in previously incorporated U.S. Patent App. Ser. No, {Attorney 
Docket No. 1934-25-3). Alternatively, also as discussed In U.S. Patent App. 
Ser. No. (Attorney Docket No. 1934-25-3), the host processor 12 may 
instantiate some or all portions of the defined circuit in software executed by 
5 the processing unit 32. Or, one can Instantiate the circuit defined by the file 
162 in another manner. 

£92] The simulator 164 receives the file 1 62 from the generator 160 

and receives from the front end 1 54 designer-entered test data, such as a 
test vector, designer-entered constraint data, and a designer-entered 

10 exception-handling protocol, and then simulates operation of the circuit 
defined by the file 162. The simulator 1 64 also gathers parameter 
information (e.g., precision, latency) from the description files 138 (FIG. 6) 
that correspond to the hardwired-pipeline templates 114 that define the 
pipelines 44 that compose the circuit. The simulator 164 may retrieve this 

1 5 parameter information directly from the library 1 20, or the generator 160 
may include this parameter information in the file 162, 

[93J FIG. 8 illustrates the parsing of a symbolic mathematical 

expression by the interpreter 158 according to an embodiment of the 
invention. In other words, the syntax of the design language is the same as 
20 that used by mathematicians for writing algebraic equations. The 

explanations that follow show how a symbolic mathematical expression is a 
sufficient syntax for defining the hardwired pipelines 44 from a simple set of 
circuit primitives. 

[94] ^lG. 9 illustrates a table of hardwired-pipeline templates 114, 

25 which correspond to the hardwired pipelines 44 (FIGS. 2-4) that the 
interpreter 1 58 (FIG. 7) identifies for executing portions of the parsed 
algorithm (FIG. 8) according to an embodiment of the invention, 

[95] Referring to FIGS. 5 ~ 9, the operation of the too! 152 is 

discussed according to an embodiment of the invention. 
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[961 Suppose that one wishes to design a circuit that solves for a 

value y, which equals a mathematical expression according to the following 
equation: 

5 (2) y * ^x 4 ms(z) + mi(x) 

< 

Also suppose that x, y, and z are thirty-two-bit floating-point values. 

197] Using the input device 154, the designer enters equation (2) 

into the front end 156 of the tool 152 by entering the following sequence of 

10 mathematical symbols: "V", »x*\ "cos(z)", "+", "z 3 ", and "sin(x)". The 
designer also enters information specifying the input and output message 
specifications, for example indicating that x, y, and z are thirty-two-bit 
floating-point values. The designer may also enter information indicating 
desired operating parameters, such as the desired latency, in clock cycles, 

15 from inputs x and z to output y, and the desired types and precision of any 
intermediate values, such as cos(z) and sin(x), generated during the 
calculation of y. Furthermore, the designer may enter information that 
identifies a desired platform or pipeline accelerator 14 (FIG. 1) on which he 
wants the circuit instantiated. Moreover, the designer may specify the 

20 accuracy of any mathematical approximations that the tool 1 52 may make. 
For example, if the tool 152 approximates cos(z) using a Taylor series 
expansion, then by specifying the accuracy of this approximation, the 
designer effectively specifies the number of terms needed in the expansion. 
Alternatively, the designer may directly specify the number of terms in the 

25 expansion. The implementation of a function as a Taylor series expansion 
is further described below in conjunction with FIGS. 13-17. 

t98| The front end 156 converts these mathematical symbols and 

the other information into a format compatible with the interpreter 15$ if this 
information is not already in a compatible format. 
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£09] Next, the interpreter 1 58 determines whether any of the 

hardwired-pipeline templates 114 in the library 1 20 defines a hardwired 
pipeline 44 that can solve for y in equation {2} within the specified behavior 
and operating parameters and that can be instantiated within the desired 
5 platform and on the desired pipeline accelerator 14 (FIG. 1). 

[100] If the library 1 20 does include such a template 1 14, then the 

interpreter 1 58 informs the designer, via the display device 155, that a 
conventional FPGA synthesizing and routing tool can generate firmware for 
instantiating this hardwired pipeline 44 from the identified template 114, the 
10 corresponding communication-shell template 1 12, and the corresponding 
top-level template 101. 

[101] If, however, the library 120 includes no template 114 that 

defines a hardwired pipeline 44 that can solve for y in equation (2), then the 
interpreter 158 parses the equation (2) into portions, and determines 
1 5 whether the library includes templates 114 that define hardwired pipelines 
44 for executing these portions within the specified behavior, operating 
parameters, and platform and on the specified pipeline accelerator 14 (FIG, 
1). 

[1 02] To identify a circuit that can solve for y in equation (2) but that 
20 includes the fewest number of hardwired pipelines 44, the interpreter 1 58 
parses the equation (2) according to a top-down parsing sequence as 
discussed below. Typically, this top-down parsing sequence corresponds 
to the known algebraic laws for the order of operations, 

[1 03} First, the interpreter 1 58 parses the equation (2) into the 

25 following two portions: "V", which is portion 170 in FIG. 8, and Vcos(z) + 
z 3 sin(x}", which is portion 1 72. 

[1 04] if the interpreter 158 determines that the library 120 Includes at 

least two hardwired-pipeline templates 1 f 4 that define hardwired pipelines 
44 for respectively executing the portions 1 70 and 172 of equation (2), then 
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the interpreter passes the identity of these templates to the file generator 
160. 

[105] In this example, however, the interpreter 158 determines that 

although the library 120 includes a hardwired-pipeline template 114 that 
5 defines a pipeline 44 for executing the square-root operation 1 70 of 
equation (2), the library includes no hardwired-pipeline template that 
defines a pipeline for executing the portion 172, 

[106} Next, the interpreter 1 58 parses the portion 172 of equation (2). 

Specifically, the interpreter 158 parses the portion 172 into the following 
10 three respective portions 174, 178, and 1 78: "x^osfe)", "+", and "z^/nfx/v 

|107] If the interpreter 158 determines that the library 1 20 includes at 

least three hardwired-pipeline templates 114 that define hardwired pipelines 
44 for respectively executing the portions 174, 176, and 178 of equation (2), 
then the interpreter passes the identity of these templates to the file 

15 generator 1 60. 

[108] In this example, however, the interpreter 1 58 determines that 

although the library 1 20 includes a hardwired-pipeline template 1 14 that 
defines a hardwired pipeline 44 for executing the summing operation ?76of 
equation (2), the library includes no templates 114 that define hardwired 

20 pipelines for executing the portions 174 or 178, 

11091 Next, the interpreter 1 58 parses the portions 1 74 and 178 of 

equation (2), Specifically, the interpreter 158 parses the portion f 74 into 
three portions 180 ("x 4 "), 182 ("•"), and 1 84 Ccos(z)"), and parses the 
portion 178 into three portions 186 fz 3 "), 188 ("•"), and 1 90 {"sin(x)"). 

25 [1 1 0] If the interpreter 158 determines that the library 1 20 does not 

include hardwired-pipeline templates 114 that define hardwired pipelines 44 
for respectively executing each of the portions 180, 182, 184, 186, 188, and 
190, then the interpreter displays via the device 155 an error message 
indicating that the library does not support a circuit that can solve for y in 

31 



WO 2806/03971« 



PCT/fc'S2»OS/035813 



equation (2). In one embodiment of the Invention, however, the library 120 
includes hardwired-pipeline templates 1 14 that provide the primitive 
operations for multiplication and for raising variables to a power (e.g., 
cubing a value by using two multipliers in sequence) for single- or 
5 double-precision floating-point data types, and for data-type conversion. 
Also in this embodiment, the fool 1 52 recognizes common factors, for 
example that x is a factor of x 3 if sin(x 3 ) was needed instead of the sin(x), 
and generates circuitry to provide these common factors from chained 
multipliers. 

10 [11 1] in this example, however, the interpreter f 58 determines that 

the library 120 includes hardwired-pipeline templates 114 that define 
hardwired pipelines 44 for respectively executing each portion 1 80, 182, 
184, 186, 188, and 1 90 of equation (2). 

[1 121 Then, the interpreter 158 provides to the file generator 160 the 

1 5 identities of all the hardwired-pipeline templates 1 14 that define the 

hardwired-pipelines 44 for executing the following eight portions of equation 
(1): 170 CV"), ?76 ("+"), ?80("x*'), 182 ( aj '), 184 {"cosfe}"), 186( tt ^ f '), 188 
{"•"), and 190 C$in(xD. 

[1 1 33 Referring to FIGS. 6-10, the file generator 1 60 generates a 
20 table 192 (FIG, 9) of the hardwired-pipeline templates 1 14 identified by the 
interpreter 1 58, and displays this table via the device 1 55. In a first column 
1 94, the table 1 92 lists the portions 170 (V), 176 ("+"), tSO fx*'), 182 ("<"), 
184 C'cosizy), 186 Cz 3 "), 188 ("•"), and 1 90 Csm(x) v ) of equation (2). In a 
second column 196, the table 1 92 lists the hardwired-pipeline template or 
25 templates 114 that define a hardwired pipeline 44 for executing the 

respective portion of equation (2). And in a third column 198, the table 192 
lists parameters, such as the latency (in units of cycles of the signal that 
clocks the defined pipeline 44) and the input and output precision, of the 
hardwired pipeline(s) 44 defined by the templates 1 14 in the second column 
30 196. As shown in the table 192, in this example the seven 
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hardwired-pipeline templates f44 f - 1 14 7 m column 196 define hardwired 
pipelines 44^ - 44 7 for respectively executing the con'esponding portions of 
equation (2) in column 194, There are only seven pipeline templates 1 14, — 
114 r for the eight portions of equation (2) because the template 1 14 s 

5 defines a multiplier pipeline 44 s that can execute both "•" portions 182 and 
188. Furthermore, although we have labeled the pipeline templates as 114i 
- 114?, It is not required that these templates be sequentially ordered within 
the library 1 20. Moreover, the library 120, and thus the table 1 92, may 
include multiple templates 114 that define respective pipelines for executing 

10 each of the eight portions 170, 176, 180, 182, 184, 186, 188, and 190 of 
equation (2). 

[114J Next, using the table 192, the file generator 160 selects the 

pipelines 44 from which to build a circuit that solves for y in equation (2). 
The generator 160 selects these pipelines 44 based on the behaviors), 

15 operating parameters), platform(s), and pipeline acceierator(s) 14 (FIG* 1) 
that the designer specified. For example, if the designer specified that x, y, 
and z are thirty-two-bit floating-point quantifies, then the generator 160 
selects pipelines 44 that operate on thirty-two-bit floating-point numbers. If 
the available pipelines 44 for a particular portion of the equation (2) do not 

20 meet all of the designer's specifications, then the generator 160 may use a 
default set of rules to select the best pipeline. For example, the rules may 
indicate that if there is no available pipeline 44 that meets the specified 
latency and precision requirements, then, with the designer's authorization, 
the generator 160 defaults to the pipeline having the specified precision and 

25 the latency closest to the specified latency. Otherwise a new pipeline 44 
with the specified latency is placed In the library, or the designer can select 
another pipeline from the table 192. As an example of satisfying the latency 
requirements, two versions of an x 4 circuit may be represented by 
respective hardwired-pipeline templates 1 14 in the library 120: a pipelined 

30 version using two fully registered multipliers in a cascade, or an in-place 
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version using a single, fully registered multiplier, a one-bit counter, and a . 
multiplexer. The pipelined version consumes roughly twice the circuit 
resources but accepts one input value every clock cycle, in contrast, the in- 
place version consumes fewer circuit resources but accepts a new input 
5 value only every other clock cycle. 

[1 1 51 Then, the file generator 160 interconnects the selected 

hardwired pipelines 44 to form a circuit 200 (FIG. 10) that can solve for y in 
equation (2). The generator 1 60 also generates a schematic diagram of the 
circuit 200 for display via the device 1 55, 

10 [1 1 6] To form the circuit 200 f the file generator 1 BO first determines 

how the selected hardwired pipelines 44 f - 44 7 can "fit" into the resources 
of a specified accelerator 14 {FIG. 1) (or a default accelerator if the 
designer does not specify one). For example, the file generator 160 
calculates the number of PLICs 60 (FIG, 3) needed to contain the eight 

1 5 instances of the pipelines 44? ~ 44 T (this includes two instances of the 
pipeline 44 s ) 

[117] In this example, the generator 160 determines that each FLIC 

60 (FIG. 3) can hold only a respective one of the pipelines 44? - 44 7 \ 
consequently, the generator 1 60 determines that eight pipeline units S0i - 
20 50 8 are needed to instantiate the circuit 200, 

[118] Next, based on the platform that the designer specifies, the 

generator 1 60 "inserts" into each of the PLICs 60? - 60 8 of the pipeline units 

50? - 50 B a respective hardware-interface layer 62? - 62 8 . Assuming that 

the designer specifies platform m~1, the generator 160 generates the layers 

25 62 f - 62 s from the following templates in section 1 22? of the library 120: the 

interface-adapter-layer template 108 it the f ramework-services-layer 

template 11 0 4 , and the communication-shell templates 112i,i- 112 i)7 , 

which respectively correspond to the pipeline templates 1 14i - 114 7 , and 

thus to the pipelines 44? - 44 7 . More specifically, the generator 160 

30 generates the hardware-interface layer 62? from the interface-adapter-layer 
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template 1 08 it the framework-services-layer template 1 10 it and the 
communication-shell template 112 it1 . Similarly, the generator 1 60 
generates the hardware-interface layer 62 2 from the templates 108 1t 110 1t 
and f f2f,2, the hardware-interface layer 62 3 from the templates f08 f> 1 10 it 

5 and 112i, 3 , and so on. Furthermore, because the PLICs 60 s and 60 s both 
will include the multiplier pipeline 44 s , the generator 1 60 generates both of 
the hardware-interface layers 62 5 and 62 s from the interface-adapter and 
framework-services templates ?08* and 1 1 0i and from the 
communication-shell template 112 irS ; consequently, the hardware-interface 

10 layers 62 5 and 62 6 are identical but are instantiated on respective PLICs 60 5 
and 60 s . Moreover, the generator 100 generates the hardware-interface 
layer 62 7 from the templates 108+, 110i, and 1 1 2 it6l and the 
hardware-interface layer 62 g from the templates 1Q8+, 110 1t and 1 12 %7 . 

[119] Then, the generator 160 "inserts" into each hardware-interface 

1 5 layer 62 f ~ 62 8 a respective hardwired pipeline 44* - 44 7 (the generator 1 60 
inserts the pipeline 44 s into both of the hardware-interface layers 62 5 and 
62 6 , the pipeline 44 s into the hardware-interface layer 62 n and the pipeline 
44 7 info the hardware-interface layer 62 s ). More specifically, the generator 
1 60 inserts the pipelines 44 1 - 44 7 into the hardware-interface layers 62* - 
20 62 R by respectively inserting the hardwired-pipeline templates f f 4* - 11 4 7 
into the communication-shell templates 112i,i- 112 1)7 . 

[120] Next, the generator 1 60 interconnects the pipeline units 50 T - 

S0 a to form the circuit 200, which generates the value y from equation (2) at 
its output (Le., the output of the pipeline unit 50 8 ). 

25 [1 21 3 Referring to FIG. 1 0, the circuit 200 includes an input stage 

206, first and second intermediate stages 208 and 21 0, and an output stage 
212, and operates as follows. The input stage 206 includes the hardwired 
pipelines 44 1 ~ 44 4 and operates as follows. The pipeline 44., receives a 
stream of values x via an input portion of the hardware-interface layer 62* 

30 and generates, in a pipelined fashion, a comesponding stream of values 
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sin(x) via an output portion of the layer 62*. Likewise, the pipeline 40 2 
receives a stream of values z via an input portion of the hardware-interface 
layer 62 2 and generates, in a pipelined fashion, a corresponding stream of 
values 7? via an output portion of the layer 62 2 , the pipeline 44 3 receives the 

5 stream of values x via an Input portion of the hardware-interface layer 62 3 
and generates, in a pipelined fashion, a corresponding stream of values x 4 
via an output portion of the layer 62 3t and the pipeline 44 4 receives the 
stream of values z via an input portion of the hardware-interface layer 62 4 
and generates, in a pipelined fashion, a corresponding stream of values 

10 cos(z) via an output portion of the layer 62 4 . 

[122J The first intermediate stage 208 of the circuit 200 includes two 

instantiations of the pipelines 44 5 and operates as follows. The pipeline 44$ 
in the PLIC 60 5 receives the streams of values sin(x) and z 3 from the input 
stage 206 via an input portion of the hardware-interface layer 62 s and 

1 5 generates, in a pipelined fashion, a corresponding stream of values /smfxj 
via an output portion of the layer 62 5 . Similarly, the pipeline 44 s in the PLIC 
60 e receives the streams of values x 4 and cos(z) from the input stage 206 
via an input portion of the hardware-interface layer 62 $ and generates, in a 
pipelined fashion, a corresponding stream of values x 4 cos{z} via an output 

20 portion of the layer 62 s , 

[123] The second intermediate stage 210 of the circuit 200 includes 

the hardwired pipeline 44 Ss which receives the streams of values ^sinfe) 
and x 4 cos(z) from the first intermediate stage 208 via an input portion of the 
hardware-Interface layer 62 7 > and generates, in a pipelined fashion, a 
25 corresponding stream of values z 3 sin(x) + x 4 aos(z) via an output portion of 
the layer 62 7 , 

[124] And the output stage 212 of the circuit 200 includes the 
hardwired pipeline 44 T , which receives the stream of values z 3 sin(x) + 
x 4 cos(z) from the second intermediate stage 210 via an input portion of the 
30 hardware-interface layer 62 8l and generates, in a pipelined fashion, a 
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corresponding stream of values y = ^z 3 sm(x) + x* cos(z) via an output portion 
of the layer 62 8 . 

[125] Referring to FIGS. 7, 9, and 10, the designer may choose to 

alter the circuit 200 via the input device 154. 

5 [126] For example, the designer may swap out one or more of the 

pipelines 44* - 44 7 with one or more other pipelines from the table 192. 
Suppose the square-root pipeline 44 7 has a high precision but a relatively 
long latency per the default rules that the generator f 60 follows as 
discussed above. If the table 192 includes another square-root pipeline 
10 having a shorter latency, then the designer may replace the pipeline 44 7 
with the other square-root pipeline, for example by using the input device 
154 to "drag" the other pipeline from the table info the schematic 
representation of the PLIC 6% 

£127] In addition, the designer may swap out one or more of the 
1 5 hardwired pipelines 44 t ™ 44 7 with a symbolically defined polynomial series 
(Le., a Taylor Series equivalent) that approximates one of the pipelined 
operations. Suppose the available square-root pipeline 44 7 has insufficient 
mathematical accuracy per the designers' specification and the default rules 
that the generator 160 follows as discussed above. If the designer then 
20 specifies a new square-root function as a series summation of related 
monomials, then the front end 1 56, interpreter 1 58, and file generator 1 60 
concatenate a series of parameterized monomial circuit templates into a 
circuit that solves for square roots. In this way the designer replaces the 
default pipeline 44 7 with the higher-precision square-root circuit using 
25 symbolic design. This example illustrates the symbolic use of polynomials 
to define new mathematical functions as established by Taylor's Theorem. 
A more detailed example is discussed below in conjunction with FIGS. 13- 
17. 



37 



WO 2806/03971« 



PCT/fc'S2»OS/035813 



[1281 Tne designer may also change the topology of the circuit 200. 

Suppose that according to the default rules discussed above, the generator 
160 places each instantiation of the hardwired pipelines 44? - 44 7 into a 
separate PLIC 60, But also suppose that each PLIC 60 has sufficient 

5 resources to hold multiple pipelines 44, Consequently, to reduce the 

number of pipeline units 50 that the circuit 200 occupies, the designer may, 
using the input device f 54, move some of the pipelines 44 into the same 
PLIC. For example, the designer may move both instantiations of the 
multiplier pipeline 44 s out of the PLICs 60 s and 60 s and into the PLIC 60 7 

10 with the adder pipeline 44$, thus reducing by two the number of PLICs that 
the circuit 200 occupies. The designer then manually interconnects the two 
instantiations of the pipeline 44 s to the pipeline 44 e within the PLIC 60 7 , or 
may instruct the generator 1 60 to perform this interconnection. Although 
the library 120 may not include a communication-shell template 1 12 that 

15 defines a communication shell 74 for this combination of multiple pipelines 
44 5 and 44«, the designer or another may write such a template and debug 
the communication shell that the template defines without having to rewrite 
the interface-adapter-layer and framework-services templates 108? and 
11 Of and, therefore, without having to re-debug the layers that these 

20 templates define. This rearranging of pipelines 44 within the PLICs 60 is 
also called "refactoring" the circuit 200. 

[129j Moreover, the designer may decide to breakdown one or more 

of the pipelines 44? - 44jr into multiple, less complex pipelines 44. For 
example, to equalize the latencies in the stage 206 of the circuit 200, the 
25 designer may decide to breakdown the x 4 pipeline 44 3 into two x 2 pipelines 
(not shown) and a multiplier pipeline 44$. Or, the designer may decide to 
replace the sin(x) pipeline 44? with a combination of pipelines (not shown) 
that represents sin(x) in a series-expansion form (e.g. Taylor series, 
MacLaurin series). 

38 



WO 2806/03971« 



PCT/fc'S2»OS/035813 



[130} Referring to FIGS. 7 and 10, after the designer has made any 

desired changes to the circuit 200, the generator 160 generates the file 162, 
which describes the circuit in terms of the pipeline units 50, the PLICs 60, 
the library templates that compose the circuit, and the interconnections 
5 between the pipeline units. Specifically, assuming that the designer has not 
modified the circuit 200 from the layout shown in FIG. 10, the file 162 
indicates that the circuit is designed for instantiation on eight pipeline units 
50 f - 50 s of a pipeline accelerator 14 {FIG. 1) that is compatible with 
platform m~1 . The file 162 also identifies the eight PLICs 60* - 60 8 on the 

10 eight pipeline units S0 f - S0 8 , and for each PLIC, identifies the templates in 
the library 120 that define the circuitry to be instantiated on the PLIC. For 
example, referring to FIGS. 6 and 10, the file 162 indicates that the 
combination of the following templates in the library 120 defines the circuitry 
to be instantiated on the PLIC 60 f : 101 1t 108 1t 1 10 f , 1 12 %i , 114 if and 1 16 4 . 

15 Furthermore, the file 162 includes the values of ali constants defined in the 
configuration template 1 1 S f . The file 162 may also include one or more of 
the descriptions 128 ~ 134 and 1 38 corresponding to these templates, or 
portions of these descriptions. Moreover, the file 1 62 defines the 
interconnections between the PLICs 60 f - 60 8 and the message 

20 specifications for these interconnections The file 162 also defines any 

designer-specified range constraints for generated values, exceptions, and 
exception-handline routines. The generator 1 60 may write the file 162 in 
XML or In another language with XML tags so that both humans and other 
tools/machines can read the file. Alternatively, the generator 1 60 may write 

25 the file 162 in a language other than XML and without XML tags. 

£131] Referring to FIGS. 6, 7, 9, and 10, the designer may instruct 

the simulator 1 64, via the input device 1 54, to simulate the circuit 200 using 
a conventional simulation algorithm. The simulator 1 64 uses the 
information in the file 1 62 and the test vectors provided by the designer to 
30 simulate the operation of the circuit 200. The simulator 164 first determines 
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the operating parameters of the hardware-interface layers 62* - 62 8 and of 
the hardwired pipelines 44* - 44? from the file 162, or by extracting this 
information directly from the description files 128 1s 130 ir 132 it1 - 132 1t7< and 
138i - 138 7 in the library 1 20. As discussed above, these parameters 

5 include, e.g., circuit latencies, and the precision (e.g., thirty-two-bit integer, 
sixty-four-bit floating point) of the values that the pipelines 44* ~ 44 7 receive 
and generate. For example, from the description files 1 28 it f 30*, 132ij, 
and 1 38i, the simulator 1 64 determines the latency of the PLIC 60* from the 
time a value x enters the hardware-interface layer 62i until the time that the 

1 0 layer 62* provides sin(x) on an external pin (not shown) of the PLIC 60* . 
The latency information in these description files may be estimated 
information, or may be actual information derived from an analysis of an 
instantiation of the pipeline 44* and the hardware-interface layer 62* on the 
PUC 60*. The simulator 1 64 then estimates the latencies and other 

1 5 operating parameters of the PLlCs 60 2 - 6% and simulates the operation of 
the circuit 200 to generate an output test stream of values y in response to 
input test streams of values x and z. 

£1323 FIG. 11 is a schematic diagram of the circuit 200 of FIG. 10 

disposed on a single pipeline unit 50 and in a single PLIC 60 according to 
20 an embodiment of the invention. 

Referring to FIGS. 6, 7, 9, and 11 , the operation of the tool 1 52 
is discussed according to another embodiment of the invention. 

[134] Following the same steps described above in conjunction with 

the formation of the circuit 200 of FIG. 10, the generator 160 determines 
25 that all of the hardwired pipelines 44* - 44 7 (the multiplier pipeline 44$ is 
instantiated twice) can fit within a single PLIC 60 with the same topology 
shown in FIG. 10. 

[135] Although the library f 20 includes no communication-shell 

templates 112 for this combination of the hardwired pipelines 44* - 44?, for 

30 simulation purposes the tool 1 52 derives the operational parameters and 
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message specifications of the hardware-interface layer 62 from the 
description flies 128 i} 130 1t 132 it1 - 132 1t4 , and 132 it7 . Because the PLIC 
60 incorporates the interface-adapter layer 70 and framework-services layer 
72 defined by the templates 108* and 1 1 0 t , the tool 152 estimates the input 

5 and output operational parameters, e.g., input and output latencies, and the 
message specifications of the layers 70 and 72 directly from the description 
files 128i and 130 f . Then, referring to FIGS. 10-11, because the values x 
and z are input in parallel to the pipelines 44* - 44 4 , the tool 152 derives the 
input operating parameters of the communication shell 74 of FIG. 11 from 

10 the description files 132? - 132 1f4 , which describe the communications 
shells for the pipelines 44? - 44 4 . For example, if the operational 
parameters of these communication shells are similar, then the tool 152 
may merely estimate that the input-side operational parameters for the shell 
74 are the same as the parameters from one of the description fiies 132 %1 - 

15 132^4, Alternatively, the tool 152 may estimate that an intermediate 

data-type translation is needed for the Input-side operational parameters of 
the communication shell 74, or that an averaging operation is needed for 
the input-side operational parameters of the communication shell, if the 
respective input-side parameters in the description fiies f 32*,* - 132 1t4 do 

20 not match. Similarly, because the values y are output from the pipeline 44 7 , 
the tool 152 derives the output operating parameters for the communication 
sheii 74 from the description file 132ij, which describes the communication 
shell for the pipeline 44 7 . For example, the tool 152 may estimate that the 
output-side operational parameters for the shell 74 are the same as the 

25 output-side parameters from the description file 1 32 it7 , 

{1363 Next, the generator 160 generates the file 1 62, which defines 

the circuit 200 of FIG. 11, and the simulator 164 simulates the circuit using 
the operational parameters calculated for the hardware-interface layer 62 
by the generator 160. 
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[1373 FWt- 12 is a block diagram of a circuit 220, for which the tool 

152 of FIG. 7 generates a file f 62 according to an embodiment of the 
invention where the circuit solves for a variable in an equation that includes 
constant coefficients. The circuit 220 is similar to the circuit 200 except that 
5 the hardwired pipelines 44 z and 44 s respectively generate ax 4 and dz 3 
instead of x 4 and z 3 , where a and b are constant coefficients. 

[138] In this embodiment, the designer wants to design a circuit to 

soive for y in the following equation: 

10 (3 ) y~ x fax* cos(s) -f bz i sm(x) 

The only differences between equation (3) and equation (2) is the presence 
of the constant coefficients a and b. 

[139] Referring to FIG. 1 0, one way for the tool 1 52 to generate such 

15 a circuit is to modify the circuit 200 is to parse equation (3) into portions 
including "a - x** and "6 - z 3, \ and to add two corresponding PLICs (not 
shown) on which are instantiated the multiplication pipeline 44 s : one such 
multiplier PLIC between the PLICs $0 2 and 60 s and receiving as inputs z 3 
and b, and the other such multiplier PLIC between the PLICs 60 3 and 60 $ 
20 and receiving as inputs x 4 and a. 

[140J Although such a modified circuit 200 is contemplated to 

accommodate the constant coefficients a and b, this circuit would require 
two additional pipeline units SO. 

£141} Referring to FIGS, 7, 10, and 12, in this embodiment, however, 

25 the tool 152 generates the circuit 220 by replacing the pipelines 44 z and 44 3 
in the circuit 200 with pipelines 44 s and 44$, which respectively perform the 
operations bz 3 and ax 4 . Of course this assumes that the section 124 of the 
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library 120 (FIG, 6) includes corresponding hardwired-pipeline templates 
1 14 B and 1 14 & . 

1142} Referring to FIGS. 7 and 12, to set the values of the 
coefficients a and b, the designer may enter the values as part of equation 
5 (3), or may enter the values separately. Assume that the designer wants 
a=2.0 and b-3,5. According to the former entry method, he enters equation 

(3) as: * y - s (2x 4 cos(s) + 3.5z 3 sm(x) And according to the latter entry 
method, he enters equation (3) as y « *jox* cos(z) + bz 3 m\{x) , and then enters 

10 [143] The generator 1 60 then generates the file 162 to include the 

entered values for the coefficients a and 6. These values may contained 
within one or more XML tags or be present In some other form. 

[144] In another variation, the values of a and b may be provided to 

the configuration managers 88 (FIG. 3} of the PLICs 60 3 and 60 2 as 

15 soft-configuration data. More specifically, a configuration manager (not 
shown and different from the configuration managers 80), which is 
described in previously incorporated U.S. Patent App. Ser. No. (Attorney 
Docket No. 1934-25-3, 1934-26-3, and 1934-36-3) and which Is executed 
by the host processor 12 (FIG, 1), initializes the values of a and b by 

20 sending configuration messages for a and b to the pipeline units 50 3 and 
50 2 . The accelerator-configuration registry 40 (FIG. 1) may store a and b as 
XML files to initialize the configuration messages created and sent by the 
configuration manager executed by the host processor 12. 

£145] Still referring to FIGS. 7 and 12, the tool 1 52 can use similar 

25 techniques to set the values of constant coefficients for other types of circuit 
portions such as filters, Fast Fourier Transformers (FFTs), and Inverse Fast 
Fourier Transformers (IFFTs). 

1146} Referring to FIGS. 7 - 12, other embodiments of the tool 1 52 
and its operation are contemplated. 

43 



WO 2806/03971« 



PCT/fc'S2»OS/035813 



[1473 For example, one or more of the functions of the tool 152 may 

be performed by a functional block (e.g., front end 156, interpreter 1 58) 
other than the block to which the function is attributed in the above 
discussion. 

5 [148] Furthermore, the too! 152 may be described using more or 

fewer functional blocks, in addition, although the tooi 152 is described as 
either fitting the eight instantiations of the hardwired pipelines 44* - 44 7 in 
eight PLfCs 60 f - 60 s (FIGS. 10 and 12) or in a single PLIC 60 (FIG. 11), 
the too! 152 may fit these pipelines in more than one but fewer than eight 
10 PLICs, depending on the resources available on each PLiC. 

[149] Moreover, although described as allowing a designer to define 

a circuit using conventional mathematical symbols, alternate embodiments 
of the front end 156 of the tool 1 52 may lack this ability, or may allow one to 
define a circuit using other formats or languages such as C++ or VHDL. 

1 5 [150J Furthermore, although the tool 152 is described as allowing 

one to design a circuit for instantiation on a PLiC, the tool 152 may also 
allow one to design a circuit for instantiation on an ASIC. 

[151J in addition, although the tooi 152 is described as generating a 

file 162 that defines an algorithm-implementing circuit, such as the circuit 

20 200 (FIG. 11), for instantiation on a specific pipeline accelerator 14 (FIG.14) 
or on a pipeline accelerator that is compatible with a specific platform, the 
tool may generate, in addition to or instead of the file 162, a file (not shown) 
that more generally defines the algorithm. Such a file may include 
algorithm-definition data that is sometimes called "meta-data," and may 

25 allow the host processor 12 (FIG. 1 ) to implement the algorithm in any 
manner (e.g., hardwired pipeline(s), software, a combination of both 
pjpe!ine(s) and software) supported by the peer vector machine 10 (FIG, 1). 
Typically , meta-data describes something, such as an algorithm or another 
file, but is not executable. For example, the information in the description 

30 files 126-134 {FIG. 6) may include meta-data. But a processor, such as the 
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host processor 12, may be able to generate executable code from 
meta-data. Consequently, a meta-data file that defines an algorithm may 
allow the host processor 12 to configure the peer vector machine 10 for 
implementing the algorithm even where the machine does not support the 
impiementatlon(s} specified by the file 162. Such configuring of the peer 
vector machine 10 is described in U.S. Patent Application Sen No. 
(Attorney Docket Nos. 1934-25-3, 1934-26-3, and 1934-36-3), which were 
previously incorporated by reference. 

[152] Moreover, the tool 1 52 may generate, and the library 120 (FIG. 

6) may store, one or more meta-data fifes (not shown) for describing the 
messages that carry data to/from the PLICs 60 (or software equivalents) of 
a circuit, such as the circuit 200 (FIG. 10). For example, if the data 
generated by the PLICs 60 is floating-point data, then a meta-data file 
specifies this. The file 1 62 (FIG. 7) incorporates or points to these 
meta-data files so that the host processor 12 (FIG. 1) can instantiate the 
message objects that generate such messages as discussed in previously 
incorporated U.S. Patent App. Ser. Nos. (Attorney Docket Nos. 1934-25-3, 
1934-26-3, and 1934-36-3). 

[153] Furthermore, the tool 152 may generate, and the library 120 

(FIG. 6) may store, one or more meta-data files (not shown) for describing 
the exceptions that the PLICs 60 (or software equivalents) of a circuit, such 
as the circuit 200 (FIG. 10), generate. For example, if a PLIC 60 
implements a divide-by-zero exception, then a meta-data file specifies this. 
The file 1 62 (FIG. 7} incorporates or points to these meta-data files so that 
the host processor 12 (FIG. 1) can instantiate corresponding exception 
handlers as discussed in previously incorporated U.S. Patent App. Ser. 
Nos. (Attorney Docket Nos. 1934-25-3, 1934-28-3, and 1934-36-3). 

[154] In addition, the tool 1 52 may generate, and the library 120 
(FIG. 6) may store, one or more meta-data files (not shown) for describing 
the PLICs 00 (or software equivalents) of a circuit, such as the circuit 200 
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(FIG. 10). For example, such a rneta-dafa file may describe the 
mathematical operation performed by, and the input and output 
specifications of, circuitry to be instantiated on a corresponding PL1C (or a 
software equivalent of the circuitry). The file 1 62 (FIG. 7} incorporates or 

5 points to these meta-data files so that the host processor 12 (FIG. 1) can 1 ) 
determine which firmware files (or software equivalents) stored in the library 
120 or In another library will respectively cause the PLICs (or the host 
processor f 2) to instantiate the desired circuitry, or 2) generate one or more 
of these firmware files (or software equivalents) that are not otherwise 

1 0 available, as described in previously incorporated U.S. Patent App. Ser. 
Nos. (Attorney Docket Nos. 1934-25-3, 1934-26-3, and 1934-36-3). 

[155] Moreover, the library 1 20 (FIG. 6) may store one or more of the 

files 1 62 (FIG. 7) that the tool 1 52 generates, so that a designer can 
incorporate previously designed circuits, such as the circuit 200 (FIG, 10), 
1 5 into a new larger and more complex circuit. The tool 152 may then 
generate a new file 162 that defines this new circuit. 

[156] Referring to FIGS. 13-17, according to another embodiment of 

the invention, the tool 1 52 (FIG. 7) allows one to design a circuit for 
implementing virtually any complex function f(x) by expanding the function 
20 into an equivalent Infinite series, Ivlany functions, such as f(x) ~ cos(x) and 
f(x) ~ e* can be expanded into an infinite series, such as the Taylor series 
or the following MacLaurin series, which is a special case (a=0) of the 
Taylor series: 

25 (3) f(x) - /(0) +1M X +I^l x * + . . . + £Mf 

Consequently, a combination of summing and multiplying hardwired 
pipelines 44 interconnected to generate ax + bx 2 * cx 3 + . . . + vx" can 
implement any function f(x) that one can expand into a MacLaurin series, 
where the only differences in this combination of pipelines from function to 
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function are the values of the constant coefficients a, b, c, . . v. 
Therefore, if the tool 152 is programmed with, or otherwise has access to, 
the coefficients for a number of functions f(x), then the tool can implement 
any of these functions as a series expansion. Furthermore, because the 

5 accuracy of the implementation of a function f(x) is proportional to the 
number of expansion terms calculated and summed together, the tool 152 
may set the number of expansion terms that the interconnected pipelines 
44 generate based on the level of accuracy for f(x) that the circuit designer 
(not shown) enters into the tool. Alternatively, a designer may directly enter 

10 a function f{x) into the front end 1 56 (FSG. 7) of the tool 1 52 in 
series-expansion form. 

[157] FIG. 13 is a block diagram of a circuit 240 that the toot 152 

(FIG, 7) defines for implementing f(x) - cos(x) as a MacLaurin series 
according to an embodiment of the invention. For clarity, FIG. 13 shows 

1 5 only the adders, multipliers, and delay blocks that compose the circuit 240, 
it being understood that the tool 152 may define the circuit for instantiation 
on one or more PLICs 60 using one or more hardwired pipelines 44 and 
one or more hard ware-interface layers 62 (e.g., FIGS. 10 and 12) per one 
of the techniques described above in conjunction with FIGS. 7-12. 

20 Furthermore, the circuit 240 may be part of a larger circuit {not shown) for 
implementing an algorithm having cos(x) as one of its portions. 

[158] F(x) ~ cos(x) is represented by the following IvlacLaurin series: 



25 



(4) cos<x) = l~~x 2 +-x 4 ~-x 6 4.1 x a 
w w 2! 4! 6! 8! 



The circuit 240 includes a term-generating section 242 and a term-summing 
section 244. For clarity, only the parts of these sections that respectively 
generate and sum the first four power-of-x terms of the cos(x) series 
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expansion are shown, it being understood that any remaining portions of 
these sections for respectively generating and summing the fifth and higher 
power-of-x terms are similar. 

[1593 The term-generating section 242 includes a chain of multipliers 

246i - 246 p (only multipliers 24$r246 $ are shown) and delay blocks 248 r 
248 q (only delay blocks 248r-248 3 are shown) that generate the power~of-x 
terms of the cos(x) series expansion. The delay blocks 248 Insure that the 
multipliers 246 only multiply powers of x from the same sample time. 

£160] The term-summing section 244 includes two summing paths: a 

path 250 for positive numbers, and a path 252 for negative numbers. The 
path 250 includes a chain of adders 254i~254 r (only adders 254 r 254% are 
shown) and delay blocks 256^25% {only blocks 256* and 256 2 are shown). 
Similarly, the path 252 includes a chain of adders 258i-258 t (only adder 
258? is shown) and delay blocks 260i~260 u (only blocks 260* and 260 2 are 
shown). A final adder 262 sums the cumulative positive and negative sums 
from the paths 250 and 252 to provide the value for cos(x). Although the 
adder 262 is shown as summing the first five terms of the expansion (1 and 
the first four power-of-x terms), it is understood that the final adder 262 may 
be disposed further down the paths 250 and 252 if the circuit 240 generates 
additional terms of the cos(x) expansion. Where numbers being summed 
are floating-point numbers, exceptions, such as a mantissa-register 
underflow, may occur when a positive number is summed with a negative 
number that is almost equal to the positive number. But by providing 
separate summing paths 250 and 252 for positive and negative numbers, 
respectively, the circuit 240 limits the number of possible locations where 
such exceptions can occur to a single adder 262. Consequently, providing 
the separate paths 250 and 252 may significantly reduce the frequency of 
such floating-point exceptions, and thus may reduce the time that the 
peer-vector machine 10 (FIG, 1) consumes handling such exceptions and 
the size and complexity of the exception manager 66 {FIG. 4). 
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[161] Stili referring to FIG. 13, the operation of the circuit 240 is 

discussed according to an embodiment of the invention. For purposes of 
explanation, it is assumed that each of the multipliers 246, adders 254 and 
258, has a latency (i.e., delay) D of one clock cycle. For example, prior to a 
5 first clock edge, a value x is present at the inputs of the multiplier 246 1t and 
after the first clock edge, the value x2 is present at the output of the 
multiplier 246 f . It is understood, however, that the multipliers 246 and 
adders 254 and 258 may have different latencies and latencies other than 
one, and that the delays provided by the blocks 248, 256, and 2$0 may be 
10 adjusted accordingly. 

[162] At a start time, a value Xi is present at the input of the multiplier 

246?, where the subscript "1" denotes the time or position of Xi relative to 
the other values of x. 

[163] In response to a first clock edge, a value x 2 Is present at the 

16 input of the multiplier 246 f , and x/ is present at the output of this multiplier. 
For brevity, this example follows only the propagation of x 1f it being 
understood that the propagation of x^ and subsequent values of x is similar 
but delayed relative to the propagation of x?. Furthermore, for clarity, x? is 
hereinafter referred to *x* in this example. 

20 [16-43 in response to a second clock edge, -x 2 /2/ is present at the 

output of the multiplier 246 2 , x 4 is present at the output of the multiplier 
246 3 , and x 2 is available at the output of the block 248i. 

{165] in response to a third clock edge, "1 " is present at the output of 
the block 256 it x 4 /4f is present at the output of the multiplier 246 4 , x 6 is 
25 present at the output of the multiplier 246 s , and x 2 is available at the output 
of the block 248 2 . 

[166] In response to a fourth clock edge, -x s /6! is present at the 

output of the multiplier 246$, x 8 is present at the output of the multiplier 
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24B 7 , x 2 is available at the output of the block 248 3 , and "1 + x 4 /4P is 
available at the output of the summer 254,. 

[1673 In response to a fifth clock edge, x 8 /8? is present at the output 
of the multiplier 24% "1 + x 4 /4!" is available at the output of the block 256 s , 
5 and "~x 2 /2! -x 8 /6f is available at the output of the adder 253,. 

[168J In response to a sixth dock edge, "1 + x 4 /4l + x 8 /8!" is 

available at the output of the adder 254 2 , and "«x ? /2! ~x 8 /6!" is available at 
the output of the block 260 2 . 

£1693 And in response to a seventh clock edge, "cos(x) - 1 - x 2 /2! + 

1 0 x 4 ^! - x 8 /8! 4- x 8 /8!" (cos(x) approximated to the first four power-of-x terms 
of the MacLaurin series expansion) is available at the output of the adder 
262. Therefore, in this example the latency of the circuit 240 {Le., the 
number of clock cycles from when x is available at the inputs of the 
multiplier 246, to when cos(x) is available at the output of the adder 262) is 
15 seven clock cycles. Furthermore, if the adder 262 summing a positive 
number and a negative floating-point number generates an exception, the 
exception manager 86 (FIG, 4) or the host processor 12 (FIG. 1) may 
handle this exception using a conventional floating-point-exception routine. 

[170] Alternatively, if the circuit 240 calculates one or more higher 

20 power-of-x terms, then the adder 262 is located after (to the right in FIG. 13) 
the adder that sums the highest generated term to a preceding term, and 
the operation continues as above. 

[171] Still referring to FIG. 1 3, alternate embodiments of the circuit 

240 are contemplated. For example, the circuit 240 may include multipliers 
25 and adders to generate and sum the odd power-of-x terms (e.g., x, x3, x5) 
with the coefficients of these terms set to zero. Such an alternate circuit 
240 is more flexible because it allows one to implement function expansions 
that include odd powers of x, but in this case would have a greater latency 
than seven clock cycles. 
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[172] FIG. 14 is a block diagram of a circuit 270 that the too! 152 

{FIG. 7) defines for implementing f(x) ~ cos(x) as a MacLaurin series 
according to another embodiment of the invention. The circuit 270 has a 
topology that reduces the number of delay blocks and the latency as 
5 compared to the circuit 240 of FIG. 13. Furthermore, like FIG. 13, FIG. 14 
shows only the adders, multipliers, and delay blocks that compose the 
circuit 270, it being understood that the tool 152 may define the circuit for 
instantiation on one or more PLICs 60 using one or more hardwired 
pipelines 44 and one or more hardware-interface layers 62 (e.g., FIGS. 10 
1 0 and 12) per one of the techniques described above in conjunction with 

FIGS. 7-12. Furthermore, like the circuit 240, the circuit 270 may be part of 
a larger circuit (not shown) for implementing an algorithm having cos(x) as 
one of its portions. 

[173] The circuit 270 includes a term-generating section 272 and a 

15 term-summing section 274. For clarity, only the parts of these sections that 
respectively generate and sum the first four power-of-x terms of the cosfxj 
series expansion are shown, it being understood that any remaining 
portions of these sections for respectively generating and summing the fifth 
and higher power-of-x terms are similar. 

20 [1743 The term-generating section 272 includes a hierarchy of 

multipliers 276* - 27% (only multipliers 276 r 276 8 are shown) and delay 
blocks 278- r 278 q (only deiay blocks 278^278% are shown) that generate the 
power-of-x terms of the oos(x) series expansion. The delay blocks 278 
insure that the multipliers 276 only multiply powers of x from the same 

25 sample time. 

[175J The term-summing section 274 includes two summing paths: a 

path 2S0 for positive numbers, and a path 282 for negative numbers. The 
path 280 includes a chain of adders 284r284 r (oniy adders 284 r 284 2 are 
shown) and delay blocks 286 1 -286 s (only block 286 f is shown). Similarly, 
30 the path 282 includes a chain of adders 288 1 -288 t (only adder 288 1 is 
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shown) and delay blocks 290 r 290 u (only block 290, is shown). A final 
adder 292 sums the cumulative positive and negative sums from the paths 
280 and 2B2 to provide the value for cos(x). Although the adder 292 is 
shown as summing the first five terms of the expansion (1 and the first four 
5 power-of-x terms), it is understood that the final adder 292 may be disposed 
further down the paths 280 and 282 if the circuit 270 generates additional 
terms of the cos(x) expansion. 

[1763 Still referring to FIG, 14, the operation of the circuit 240 is 

discussed according to an embodiment of the invention. For purposes of 
10 explanation, it is assumed that each of the multipliers 276, adders 284 and 
288, has a latency (i.e., delay) D of one clock cycle. It is understood, 
however, that the multipliers 276 and adders 284 and 288 may have 
different latencies and latencies other than one, and that the delays 
provided by the blocks 278 and 288 may be adjusted accordingly. 

15 f 1 77] At a start time, a value x is present at the input of the multiplier 
276*. 

[178] In response to a first clock edge, x 2 is present at the output of 
the multiplier 276 f . 

11793 in response to a second clock edge, x 4 is present at the output 

20 of the multiplier 276 2 , and x 2 is available at the output of the block 278 t . 

[1801 In response to a third clock edge, "1" is present at the output of 

the block 286 f , x 4 /4l is present at the output of the multiplier 276 e , x 6 is 
present at the output of the multiplier 276 4l -x^iZl is available at the output of 
the multiplier 276 s , and x 8 is available at the output of the multiplier 276 3 , 

25 [181] In response to a fourth clock edge, -x 6 /6! is present at the 

output of the multiplier 276 7 , x 8 f8i is present at the output of the multiplier 
276 8 , -x 2 /2! is available at the output of the block 290i, and "1 + x 4 /4!" is 
available at the output of the summer 284 f . 
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C182J in response to a fifth clock edge, "1 + x 4 /4S + x 8 /8f is available 

at the output of the adder 284 2l and ,! -x 2 /2! -x 3 /6!" is available at the output 
of the adder 288i. 

£183J And in response to a sixth clock edge, "cos(x) - 1 - x 2 /2! + x 4 /4! 

5 - x 8 /6! + x 8 /8!" (cos(x) approximated to the first four power-of-x terms of the 
MacLaurin series expansion) is available at the output of the adder 292. 
Therefore, in this example the latency of the circuit 270 is six clock cycles, 
which Is one fewer clock cycle than the latency of the circuit 240 of FIG. 13. 
But as the number of the power-of-x terms increases beyond four, the gap 
10 between the latencies of the circuits 270 and 240 increases such that the 
circuit 270 provides an even greater improvement in the latency, 

|184J Alternatively, if the circuit 270 calculates one or more higher 

power-of-x terms, then the adder 292 is located after (to the right in FIG. 14) 
the adder that sums the highest generated term to a preceding term, and 
15 the operation continues as above. 

[185J Still referring to FIG. 14, alternate embodiments of the circuit 

270 are contemplated. For example, the circuit 270 may include multipliers 
and adders to generate and sum the odd power-of-x terms (e.g., x, x3, x5) 
with the coefficients of these terms set to zero. Such an alternate circuit 

20 270 may be more flexible because it allows one to implement function 
expansions that include odd powers of x without increasing the circuit's 
latency for a given highest power of x. That is, where the highest power of 
x generated by the circuit 270 is x 8 , adding multipliers and adders to 
generate x 3 , x 5 , and x 7 would not increase the latency of the circuit 270 

25 beyond six clock cycles. This is because the circuit 270 would generate the 
power-of-x terms in parallel, not serially like the circuit 240 of FIG, 13. 

[186J PIG- 1 5 is a block diagram of a power-of-x term generator 300 

that the tool 152 (FIG. 7) defines to replace the power-of-x-term odd 

multipliers 246 3 , 246 s , 246 7 of the term-generating section 242 of FIG. 

30 13 and the power-of-x-term multipliers 276 1t 276 2 , 276 3 , 276 4 , ... of FIG. 14 
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according to an embodiment of the invention. Generally, the generator 300 
includes fewer multipliers (here one) than the term-generating sections 242 
and 272 (which each include eight multipliers), but may have a higher 
latency for a given number of generated power~of~x terms. Furthermore, 
5 like FIGS. 13-14, FIG* 15 shows only the multipliers and other components 
that compose the term generator 300 f it being understood that the tooi 152 
may define a circuit that includes the term generator for instantiation on one 
or more PLICs 60 using one or more hardwired pipelines 44 and one or 
more hardware-interface layers 62 (ag., FIGS* 10 and 12) per one of the 
10 techniques described above in conjunction with FIGS, 7-12, 

[187] The term generator 300 includes a register 302 for storing x, a 

multiplier 304 f a multiplexer 306, and term-storage registers 308* ~ 308 p 
(only registers 308i - 308 4 are shown). For clarity, only the parts of the 
generator 302 that generates the first four power-of-x terms of the cos(x) 
15 series expansion are shown, It being understood that any remaining 
portions of the generator for generating the fifth and higher power-of-x 
terms are similar, 

[188] Still referring to FIG. 15, the operation of the circuit 300 is 

discussed according to an embodiment of the Invention. For purposes of 

20 explanation, it is assumed that each of the register 302, multiplier 304 s and 
registers 308 has a respective latency (I.e., delay) of one clock cycle, and 
that the multiplexer 306 is not docked, i.e., Is asynchronous, ft is 
understood, however, that the register 302, multiplier 304, and registers 308 
may have different latencies and latencies other than one, that the 

25 multiplexer 306 may be clocked and have a latency of one or more clock 
cycles, and that the term-summing sections 244 and 274 of FIGS, 13 and 
14, respectively, may be adjusted accordingly. 

[1 89} At a start time, a value x is present at the input of the register 

302. 
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|190] in response to a first dock edge, the current value of x is 

loaded into, and thus is present at the output of, the register 302, and is 
present at the output of the multiplexer 306, which couples its input 312 to 
its output. The register 302 is then disabled. Alternatively, the register 302 
5 is not disabled but the value of x at the Input of this register does not 
change. 

[1 91] In response to a second clock edge, x 2 is present at the output 

of the multiplier 304, and the multiplexer changes state and couples its 
input 31 4 to Its output such that x2 is also present at the output of the 
10 multiplexer 306, 

mi} tn response to a third clock edge, x 2 is loaded into, and thus is 

available at the output of, the register 310,, and x 3 is available at the output 
of the multiplier 304 and at the output of the multiplexer 306. 

£1933 to response to a fourth clock edge, x 4 is available at the output 

1 5 of the multiplier 304 and at the output of the multiplexer 306. 

[1941 in response to a fifth clock edge, x 4 is loaded into, and thus is 

available at the output of, the register 310 2 , and x 5 Is available at the output 
of the multiplier 304 and at the output of the multiplexer 306. 

[195] in response to a sixth clock edge, x 6 Is available at the output 

20 of the multiplier 304 and at the output of the multiplexer 306. 

[196] In response to a seventh clock edge, x 6 is loaded into, and thus 

is available at the output of, the register 31 0 3 , and x 7 is available at the 
output of the multiplier 304 and at the output of the multiplexer 306. 

1197] In response to an eighth clock edge, x 8 is available at the 

25 output of the multiplier 304 and at the output of the multiplexer 306. 

f198| And in response to a ninth clock edge, x 8 is loaded into, and 
thus is available at the output of, the register 31 0 4 , the next value of x Is 
loaded into the register 302. But if the generator 300 generates powers of x 
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higher than x 8 , the generator continues operating in the described manner 
before loading the next value of x into the register 302. 

After the generator 300 generates all of the specified powers of 
the current vaiue of x, the register 302, multiplier 304, multiplexer 306, and 
5 registers 310 repeat the above procedure for each subsequent value of x. 

[200] Alternative embodiments of the generator 300 are 

contemplated. For example, to generate the odd powers of x for a function 
other than cos(x), one can merely add additional registers 3f0to store 
these values, because the multiplier 304 inherently generates these odd 

10 powers. Alternatively, the generator 300 may be modified to load x 2 into the 
register 302 so that the multiplier 304 thereafter generates only even 
powers of x. Moreover, one or more of the registers 308 may be 
eliminated, and the multiplexer 306 may feed the respective powers of x 
directly to the term multipliers, e.g., the term multipliers 246 z , 246 4 , 246 6 , 

15 246 8i ... of FIG. 13 and the term multipliers 276 5l 276* 276 7t 276 a of 

FIG. 14. 

[201 J FIG. 1 6 is a block diagram of a circuit 320 that the tool 1 52 
(FIG. 7) defines for implementing f(x) ~ e*as a MacLaurin series according 
to an embodiment of the invention. The circuit 320 is similar to the circuit 

20 240 of FIG. 13, but because the odd power-of-x terms for the e x expansion 
may be positive or negative, the circuit 320 also includes sign determiners 
(described below and in conjunction with FIG. 17) that respectively provide 
these odd-power-of-x terms to the proper path (positive or negative) of the 
term-summing section. For clarity, FIG. 16 shows only the adders, 

25 multipliers, delay blocks, and sign determiners that compose the circuit 320, 
it being understood that the tool 152 may define the circuit for instantiation 
on one or more PLICs 60 using one or more hardwired pipelines 44 and 
one or more hardware-interface layers 62 (e.g., FIGS. 10 and 12) per one 
of the techniques described above in conjunction with FIGS. 7-12. 
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Furthermore, the circuit 320 may be part of a larger circuit (not shown) for 
implementing an algorithm having e x as one of its portions. 

[202] F(x) ~ &* is represented by the following fvtacLaurin series: 

W 2! 3! 4! 5! 

The circuit 320 includes a term-generating section 322 and a term-summing 
section 324, which includes positive- and negative-value summing paths 
32© and 328. For clarity, only the parts of these sections that respectively 
1 0 generate and sum the first five power-of-x terms of the e x series expansion 
are shown, It being understood that any remaining portions of these 
sections for respectively generating and summing the sixth and higher 
power-of-x terms are similar. 

[203] The term-generating section 322 includes a chain of multipliers 

15 330? - 330 p {only multipliers 33Qi-330 @ are shown) and delay blocks 332?- 
332 q (only delay blocks 332^3324 are shown) that generate the power-of-x 
terms of the ex series expansion. The section 322 also includes, for each 
odd-power-of-x term (e.g., x, x 3 , x 5 , ,..), a respective sign determiner 334?- 
334 v (only determiners 334? ~ 334 3 are shown) that directs positive values of 
20 the odd-power-of-x term to the positive summing path 326 of the term- 
summing section 324, and that directs negative values of the 
odd-power-of-x term to the negative summing path 328. 

{204} The positive-value path 326 of the term-summing section 324 
includes a chain of adders 33$r336 r (only adders 336i~336 5 are shown) 
25 and delay blocks 338i~338 s (only blocks 338* - 33% are shown). Similarly, 
the negative-value path 328 includes a chain of adders 340r340 t (only 
adders 340? -~ 340 z are shown) and delay blocks 342^342^ (only blocks 
342i - 342 2 are shown). A final adder 344 sums the cumulative positive and 
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negative sums from the paths 326 and 328 to provide the value for e*. 
Although the final adder 344 is shown as summing the first six terms of the 
e* expansion ("1" and the first five power-of~x terms), it is understood that 
the finaf adder may be disposed further down the paths 326 and 328 if the 
5 circuit 320 generates additional terms of the expansion. 

{2053 Stilt referring to FIG, 16, the operation of the circuit 320 is 

discussed according to an embodiment of the invention. For purposes of 
explanation, it is assumed that each of the multipliers 330, sign determiners 
334, and adders 336 and 340 has a latency (i.e., delay) D of one clock 
10 cycle. It Is understood, however, that the multipliers 330, sign determiners 
334, and adders 334 and 336 may have different latencies and latencies 
other than one, and that the delays provided by the blocks 332, 338, and 
342 may be adjusted accordingly. 

[206] At a start time, a value x is present at both Inputs of the 

1 5 multiplier 330 1s at the input of the delay block 332 h and at the input of the 
sign determiner 334*. 

[207] In response to a first clock edge, x 2 is available at the output of 

the multiplier 330 it x is available at the output of the delay block 332?, and 
"1" is available at the output of the delay block 333?. Furthermore, If x is 
20 positive, x and logic "0" are respectively available at the (+) and (-) outputs 
of the sign determiner 334 1r conversely, if x is negative, logic "0" and x are 
respectively available at the (+) and (-) outputs of the determiner 334,. 

[208] to response to a second clock edge, x 2 /2! is available at the 

output of the multiplier 330 Zi x 3 is present at the output of the multiplier 
25 330 3 , and x is available at the output of the delay block 332%. Furthermore, 
if x is positive, "1 + x" is available at the output of the adder 336*; 
conversely, if x is negative, *1 + 0 = 1" is present at the output of the adder 
336 f . 
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[2093 in response to a third clock edge, x z /3i Is available at the output 

of the multiplier 330 4 , x 4 is available at the output of the multiplier 330 s> x is 
available at the output of the delay block 332 3> and "1 + x + x 2 /2f (x 
positive) or tt 1 + x 3 /2r (x negative) is available at the output of the adder 
5 33&z. 

[210] In response to a fourth clock edge, x 4 /4! is present at the output 

of the multiplier 330& x 5 is present at the output of the multiplier 330 7 , x is 
available at the output of the block 332 4 , and "1 + x+ x 2 /2f (x positive) or "1 
+ x 2 /2f (x negative) is available at the output of the delay block 338 2 . 

1 0 Furthermore, if x 3 /3f, and thus x, is positive, x 3 /3! and logic "0" are 

respectively present at the (+) and {-) outputs of the sign determiner 334 2 ; 
conversely, if x 3 /3/, and thus x, is negative, logic "0" and x°/3! are 
respectively present at the (+) and (-) outputs of the determiner 334 2 . 
Moreover, if x is negative, then x is available at the output of the delay block 

1 5 342i\ conversely, if x is positive, then logic "0* is available at the output of 
the delay block 342 f . 

[21 1] in response to a fifth clock edge, x 5 /5! is available at the output 

of the multiplier 330 8 , "1 + x + x 2 /2! + x 3 /3F (x positive) or "1 + x 2 /2f (x 
negative) is available at the output of the adder 336 3 , x 4 /4! is available at 
20 the output of the delay block 338 3 , and "0" (x positive) or "~x - x 3 /3! !1 (x 
negative) is available at the output of the adder 340^ 

[212J In response to a sixth clock edge, if x 5 /5f, and thus x, is 

positive, xVs/and logic "0* are respectively available at the (+) and (-) 
outputs of the sign determiner 334 3 ; conversely, if x 5 /57, and thus x, is 
25 negative, logic "0" and x 5 /5! are respectively available at the (+) and {») 
outputs of the determiner 334 3 . Furthermore, "1 + x + x 2 /2! + x 3 /3/ + x 4 /4f 
(x positive) or "1 + x 2 /2! + x 4 /4f (x negative) is available at the output of the 
multiplier 336 4 , and "0" (x positive) or «-x - x 3 /3!" (x negative) is available at 
the output of the delay block 342%. 
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[213] in response to a seventh clock edge, "1 + x + x 2 /2! + x 3 /3! + 

x 4 /4! + x 5 /5f (x positive) or "1 + x 2 /2f + x*/4f (x negative) is available at the 
output of the adder 336 5) and "0" (x positive) or "x - x 3 /3/ - x 5 /4r (x negative) 
is available at the output of the adder 340 z . 

5 [214] And in response to an eighth clock edge, "e* = "1 + x + x 2 /2! + 

x 3 /3! + x 4 /4/ + x s /5f (x positive) or "e* = 1 - x + x 2 /?/ - x 3 /&' + x74/ - xVsf (x 
negative) is available at the output of the adder 344. 

[215] Therefore, in this example, the latency of the circuit 320 is 

eight. Furthermore, if the adder 344, while summing a positive number and 
10 a negative floating-point number, generates an exception, the exception 
manager 86 (FIG. 4) or the host processor 12 (FIG. 1) may handle this 
exception using a conventional floating-point-exception routine, 

[21 61 Alternatively, if the circuit 320 calculates one or more power-of- 

x terms higher than the fifth power, then the adder 344 is located after (to 
1 5 the right in FIG. 1 6) the adder 336 or 340 that sums the highest generated 
term to a preceding term, and the operation continues as above. 

[217] Still referring to FIG, 16, alternate embodiments of the circuit 

320 are contemplated. For example, one may replace the term-generating 
section 322 with a section similar to the term-generating section 272 of FIG. 
20 14, or may replace the chain of multipliers 330 with a power-of-x generator 
similar to the generator 300 of FIG. 15. 

[218] FIG. 17 is a block diagram of the sign determiner 334i of FIG. 
16 according to an embodiment of the invention, it being understood that 
the sign determiners 334 2 ~ 334* me similar. 

25 [2191 The s '9 n determiner 334* includes an input node 330, a (-) 
output node 352, a (+) output node 354, a register 356 that stores a logic 
"0", and demultiplexers 358 and 360, 

[220] The demultiplexer 358 includes a control node 362 coupled to 
receive a sign bit of the value at the input node 350, a (-) input node 364 

60 



WO 2806/03971« 



PCT/fc'S2»OS/035813 



coupled to the input node 350, a (+) input node 366 coupled to the register 
356, and an output node 368 coupled to the (-) output node 352, 

1221] Similarly, the demultiplexer 360 includes a control node 370 

coupled to receive the sign bit of the value at the input node 350, a (-) input 
5 node 372 coupled to the register 356, a (+) input node 374 coupled to the 
input node 350, and an output node 376 coupied to the (+) output node 354, 

[2223 st ' !i referring to FIG. 17, two operating modes of the sign 

determiner 334 1 are described according to an embodiment of the invention. 

[223] In one operating mode, the sign determiner 334, receives at its 

10 input node 350 a positive (+) value v, which, therefore, includes a positive 
sign bit. This sign bit is typically the most-significant bit of v, although the 
sign bit may be any other bit of v. In response to the positive sign bit, the 
demultiplexer 360 couples v (including the sign bit) from its (+} input node 
374 to its output node 376, and thus to the (+) output node 354 of the sign 
1 5 determiner 334i . Furthermore, the demultiplexer 358 couples the logic "0" 
stored in the register 356 from the (+) input node 366 to the output node 
368, and thus to the (-) output node 352 of the sign determiner 334?. 

[224] in the other operating mode, the sign determiner 334, receives 

at its input node 350 a negative {-) value v, which, therefore, includes a 

20 negative sign bit. In response to the negative sign bit, the demultiplexer 
358 couples v (including the sign bit) from its (-) input node 364 to its output 
node 368, and thus to the (-) output node 352 of the sign determiner 334,. 
Furthermore, the demultiplexer 360 couples the logic "0" stored in the 
register 356 from the (-) input node 372 to the output node 376, and thus to 

25 the (+) output node 354 of the sign, determiner 334,. 

[2253 Still referring to FIG. 17, alternative embodiments of the sign 

determiner 334? are contemplated. For example, one may replace the logic 
"0" register with a component, such as puli-down resistor, coupled to a logic 
"0" voltage level, such as ground. 
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[2263 Referring to FIGS. 1-17, alternate embodiments of the peer 

vector machine 10 are contemplated. For example, some or all of the 
components of the peer vector machine 10, such as the host processor 1 Z 
(FIG. 1) and the pipeline units 50 {FIG, 3) of the pipeline accelerator 14 
5 (FIG. 1), may be disposed on a single integrated circuit 

[227J The preceding discussion is presented to enable a person 

skilled in the art to make and use the invention. Various modifications to 
the embodiments will be readily apparent to those skilled in the art, and the 
generic principles herein may be applied to other embodiments and 
i 0 applications without departing from the spirit and scope of the present 
invention. Thus, the present Invention is not intended to be limited to the 
embodiments shown, but is to be accorded the widest scope consistent with 
the principles and features disclosed herein. 
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WHAT IS CLAIMED IS: 

1. A computer-based design tooi, comprising: 

a front end operable to receive symbols that define an algorithm; 
5 an Interpreter coupled to the front end and operable to parse the 

algorithm into respective algorithm portions; and 

a generator coupied to the interpreter and operable to 

Identify a corresponding circuit template for each of the 
algorithm portions, each template defining a circuit for executing the 
10 respective algorithm portion, and 

interconnecting the identified templates such that the 
interconnected templates define a circuit that is operable to execute 
the algorithm. 

2. The design tool of claim 1 wherein the generator is operable to 
1 5 generate a file comprising: 

a respective pointer to each of the identified templates within a 
template library; and 

a list of the interconnections between the identified templates. 

3. The design tool of claim 1 wherein the symbols comprise 
20 mathematical symbols. 

4. The design too! of claim 1 wherein the interpreter is operable to 
parse the algorithm into respective algorithm portions that each correspond 
to a template in a library. 

5. The design tool of claim 1 wherein the generator is operable to 
25 identify the corresponding templates by accessing a library that includes the 

identified templates. 

6. The design tool of claim 1 , further comprising a library coupled 
to the generator and operable to store the identified templates, 

7. The design tool of claim 1, further comprising a simulator 

30 coupled to the generator and operable to simulate operation of the circuit by 
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determining a transfer function of the circuit defined by the interconnected 
templates. 

8. The design tool of ciaim 1 wherein: 

the front end is further operable to receive a desired operational 
5 characteristic of the circuit; and 

the generator is further operabie to identify the corresponding 
template for each of the algorithm portions such that the interconnection of 
identified templates defines the circuit having the desired operational 
characteristic. 

1 o 9. The design tool of claim S wherein the desired operational 

characteristic comprises a latency of the circuit. 

10. The design tool of claim 1 wherein: 

the front end is f urther operable to receive an identity of a platform; 

and 

1 5 the generator is further operable to, 

identify a hardware-abstraction-layer template corresponding to 
the platform, and 

interconnect the identified circuit templates to the identified 
layer template such that the interconnection of the templates defines 
20 the electronic circuit. 

1 1 . The tool of claim 1 wherein; 

the front end is further operable to receive an identity of a platform; 

and 

the generator is further operable to determine whether the circuit 
25 defined by the interconnection of the identified templates can be 
instantiated on the identified platform. 

12. A computer-based design tool, comprising: 

a front end operabie to receive symbols that define an algorithm; 
a generator coupled to the front end and operable to, 
30 identify a template that defines a first electronic circuit that is 

operable to execute the algorithm, 
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identify a template that defines a hardware interface that is 
compatible with the first electronic circuit, and 

interconnect the identified templates to define a resulting 
electronic circuit that includes the first circuit interconnected to the 
5 hardware interface. 

13. A method, comprising : 

parsing an algorithm into a combination of respective smaller 
algorithms; 

identifying a corresponding template for each of the smaller 
10 algorithms, each template defining a respective circuit that is operable to 
execute the respective smaller algorithm; and 

interconnecting the identified templates such that the interconnected 
templates define an electronic circuit that is operable to execute the 
algorithm. 

15 14. The method of claim 13, further comprising: 

generating a respective pointer to each of the identified templates 

within a library; and 

generating a list of the interconnections between the identified 

templates. 

20 1 5. The method of claim 1 3, further comprising : 

receiving an expression of mathematical symbols that defines the 
algorithm; and 

wherein parsing the algorithm comprises parsing the expression into 
groups of symbols that respectively define the smaller algorithms. 
25 16. The method of claim 13 wherein identifying the corresponding 

templates comprises searching a library that includes the corresponding 
templates. 

1 7. The method of claim 1 3, further comprising simulating the 
electronic circuit by: 

30 determining a transfer function of the circuit from characteristics of the 

interconnected templates; and 
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determining a signal output from the circuit in response to a signal 
input to the circuit. 

1 8> The method of claim 1 3 wherein identifying the corresponding 
templates comprises identifying the templates such that the interconnection 
5 of the templates represents the electronic circuit having a predetermined 
operational characteristic, 

1 9. The method of claim 1 3, further comprising: 

identifying an interface template that defines a hardware interface 
that is compatible with a predetermined pfatform on which the electronic 
10 circuit can be instantiated; and 

wherein interconnecting the templates comprises interconnecting the 
circuit templates to the Interface template such that the interconnection of 
the circuit and interface templates defines the electronic circuit. 

20. The method of claim 13, further comprising determining 
15 whether the electronic circuit can foe instantiated on a predetermined 

pfatform, 

21. A method, comprising: 

identifying a circuit template that defines a first electronic circuit that is 
operable to execute an algorithm; 
20 identifying an interface template that defines a hardware interface 

that is compatible with the first electronic circuit; and 

interconnecting the identified circuit and interface templates to 
generate a definition of a resulting electronic circuit that includes the first 
circuit interconnected to the hardware interface. 
25 22, The method of claim 21 p further comprising using the definition 

to instantiate the resulting electronic circuit on a programmable logic circuit. 

23. The method of claim 21, further comprising using the definition 
to instantiate the resulting electronic circuit on a programmable logic circuit 
having signal pins such that the hardware interface is disposed between the 
30 pins and the first circuit. 
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24. The method of claim 21 , further comprising simulating 
operation of the resulting electronic circuit based on information included in 
the circuit and interface templates. 

25. The method of claim 21 , further comprising simulating 

5 operation of the resulting electronic circuit based on information included in 
a description file that corresponds to the circuit and interface templates. 

26. A computer-readable medium, that when executed by a 
processor, causes the processor to: 

parse an algorithm into a combination of respective smaller 
10 algorithms; 

identify a corresponding template for each of the smaller algorithms, 
each template defining a respective circuit that is operable to execute the 
respective smaller algorithm; and 

interconnect the identified templates such that the interconnected 
1 5 templates define an electronic circuit that is operable to execute the 
algorithm. 

27. A library, comprising: 

one or more circuit templates that each define a respective circuit 
operable to execute a respective algorithm; and 
20 an interface template that defines a hardware layer operable to 

interface one of the circuits to pins of a programmable logic circuit when the 
layer and the one circuit are instantiated on the programmable logic circuit. 

28. The library of claim 27 wherein each circuit template includes 
extensible markup language that describes the respective algorithm. 

25 29. The library of claim 27 wherein the interface template includes 

extensible markup language that describes the hardware layer. 

30. The library of claim 27 wherein the programmable logic circuit 
comprises a field-programmable gate array. 

31 . The library of claim 27, further comprising a file that describes 
30 a platform with which the programmable logic circuit is compatible. 
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32. The library of claim 27 wherein the library comprises multiple 
circuit templates that define circuits that can be interconnected to for form a 
resulting circuit that can be instantiated one a programmable logic circuit to 
execute an algorithm, 
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TABLE 



Algorithm portion 



Templates 



sin(x) (portion 1 90 of equation (2)) I I4u* 



z 3 (portion 1S6 of equation (2)) 



cosfz) {portion 184 of equation (2)) i 14^ 



* {multiplication portions 182 and 1 14$,. 
186 of equation (2)) 



+ (portion 176 of equation (2)) 
v {portion 170 of equation (2)) 

. A 

194 




latency 
input precision: 
output precision 
latency; 
input precision 
output precision: 
latency; 
input precision 
output precision 
latency; 

input precision: 
output precision: 
latency; 
input precision: 
output precision; 
latency: 
input precision: 
output precision: 



10 clock cycles 
32-bit integer 
32-bit floating point 
10 clock cycles 
32-bit integer 
32-bit integer 
10 clock cycles 
32-bit integer 
32-bit floating point 
10 clock cycles 

32-bit floating point 
32-bit floating point 
5 clock cycles 
32>bit floatingpoint 
32-bi t floating point 
10 clock cycles 
32-bit floating point 
32-bit floating point 



/ 
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